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Quickest Detection with Social Learning: 
Interaction of local and global decision makers 
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Abstract 

We consider how local and global decision policies interact in stopping time problems such as 
quickest time change detection. Individual agents make myopic local decisions via social learning, that 
is, each agent records a private observation of a noisy underlying state process, selfishly optimizes 

r^ . its local utility and then broadcasts its local decision. Given these local decisions, how can a global 

c/2 . decision maker achieve quickest time change detection when the underlying state changes according to a 

phase-type distribution? The paper presents four results. First, using Blackwell dominance of measures, 

it is shown that the optimal cost incurred in social learning based quickest detection is always larger 

than that of classical quickest detection. Second, it is shown that in general the optimal decision policy 

l/~j . for social learning based quickest detection is characterized by multiple thresholds within the space of 

Bayesian distributions. Third, using lattice programming and stochastic dominance, sufficient conditions 
are given for the optimal decision policy to consist of a single linear hyperplane, or, more generally, 



a threshold curve. Estimation of the optimal linear approximation to this threshold curve is formulated 
as a simulation-based stochastic optimization problem. Finally, the paper shows that in multi-agent 
sensor management with quickest detection, where each agent views the world according to its prior, 
C$ ' the optimal policy has a similar structure to social learning. 

Index Terms 

Quickest time Bayesian change detection, social learning, phase-type distribution, stochastic domi- 
nance, Blackwell dominance, multi-agent sensor scheduling, partially observed Markov decision process 
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I. Introduction 

Classical Bayesian quickest time detection H3l . PPfll involves detecting a geometrically dis- 
tributed change time by optimizing the tradeoff between false alarm frequency and delay penalty. 
The literature is vast, with applications in biomedical signal processing, machinery monitoring 
and finance J35|, M, OH, EH, see also fl37l for team detection, and (EH, (51. Classical 
quickest detection can be formulated as the following sequential protocol involving a countable 
number of agents: Suppose each agent acts once in a pre-determined sequential order indexed 
by k = 1,2,.... Agent k receives an observation of the underlying state at time k and computes 
the posterior probability that the state has changed. It then reveals this posterior probability to 
subsequent agents. This process repeats until a stopping time at which the global decision maker 
announces a change. It is well known fl3l . PJ4l that the optimal policy to declare a change has 
a threshold (monotone) structure: if the posterior probability (belief state) exceeds a threshold, 
then a change is announced; otherwise agents continue making observations. 

A. Context 

Motivated by understanding how local decisions affect global decision-making in multi-agent 
systems, this paper considers a generalization of the above classical quickest detection setup. 
Given local decisions from agents that are performing social learning, how can a global decision 
maker achieve quickest time change detection? In other words, how can a stochastic control 
problem (stopping time problem) be solved to make global decisions based on local decisions 
of agents? We consider phase-type distributed change times and interaction between local and 
global decision-makers as outlined in the following two examples: 

Example 1. Social Learning based Quickest-time detection: Suppose that a multi-agent system 
performs social learning Ifl^ll 1 ! to estimate an underlying state as follows: Just as in the classical 
quickest detection protocol above, agents act sequentially in a pre-determined order. However, 
instead of revealing its posterior distribution of change, each agent reveals its local decision to 
subsequent agents. The agent chooses its local decision by optimizing a local utility function 

'Another way of viewing the social learning model is that there are finite number of agents that act repeatedly in some 
pre-defined order. If each agent picks its local decision using the current public belief, then the setup is identical to the social 
learning setup. We also refer reader to JT|, J2j for several recent results in social learning over several types of network adjacency 
matrices. 
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(which depends on the public belief of the state and its local observation). Subsequent agents 
update their public belief based on these local decisions (in a Bayesian setting), and the sequential 
procedure continues. Given these local decisions, how can such a multi-agent system detect a 
change in the underlying state and make a global decision to stop? 

Example 2: Quickest-time detection with adaptive sensing: Consider a multi-sensor system 
where each adaptive sensor is equipped with a local sensor manager (controller). The multi- 
sensor system acts sequentially as follows: Based on the existing belief of the underlying state, 
the local sensor-manager chooses (adapts) the sensor mode e.g., low resolution or high resolution. 
The sensor then views the world based on this mode. Given the belief states and local sensor- 
manager decisions, how can such a multi-agent system achieve quickest time change detection"<a 
Quickest detection with such sensor management is of importance in automated tracking and 
surveillance systems 0, 0, |fl4|. In such cases, if individual agents or cluster heads are polled 
sequentially (e.g. round-robin fashion) then the resulting dynamics are very similar to the social 
learning setup. 

Classical quickest detection is a trivial case of the above examples where agents reveal their 
local observation (instead of local decision) to subsequent agents. The above examples are non- 
trivial generalizations due to the interaction of the local and global decision makers^!. In both 
examples, the local decision determines the belief state which determines the global decision 
(stop or continue) which determines the local decision at the next time instant and so on. This 
interaction of local and global decision-making leads to discontinuous dynamics for the posterior 
probabilities (belief state) and unusual behavior as outlined below. We will show that the optimal 
decision policy has multiple thresholds and the stopping regions are non-convex. 

FigfTJa) gives a visual description of the optimal policy of social learning based quickest 
detection. It illustrates a triple threshold policy for geometric distributed change time. Complete 
details of this numerical example are given in Sec lVIH The horizontal axis ir(2) is the posterior 

2 The information flow patterns of Example 1 and 2 are similar. In Example 1, the sequence of events is prior — > observation — >• 
local decision — > posterior. In Example 2, the sequence of events is prior — > local decision — ► observation — ► posterior. 

3 A signal processing interpretation of social learning is as follows. Instead of using the posterior distribution to achieve 
quickest time detection, the decision maker (or individual agents) computes the maximum aposteriori (MAP) estimate of the 
underlying state at each time instant. Given these hard decision MAP state estimates (local decisions), how can the global 
decision maker achieve quickest change detection? 
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(a) Optimal global decision policy /i*(7r) 




(b) Value function V(n) for global decision policy 



Fig. 1. Optimal decision policy for social learning based quickest time change detection for geometric distributed change time, 
see Example 1 of Sec I VIII for details. The optimal policy n*(ir) is characterized by a triple threshold. The value function V(ji) 
is non-concave and discontinuous. 



probability of no change. The vertical axis denotes the optimal decision: u = 1 denotes stop 
and declare change, while u = 2 denotes continue. The multi-threshold behavior of Figfjja) is 
unusual: if it is optimal to declare a change for a particular posterior probability, it may not be 
optimal to declare a change when the posterior probability of a change is larger! Thus, the global 
decision (stop or continue) is a non-monotone function of the posterior probability obtained from 
local decisions. FigfTJb) shows the associated value function obtained via stochastic dynamic 
programming. Unlike standard sequential detection problems where the value function is concave, 
the figure shows that the value function is non-concave and discontinuous. To summarize, Fig{T] 
shows that social learning based quickest detection results in fundamentally different decision 
policies compared to classical quickest time detection (which has a single threshold). Thus 
making global decisions (stop or continue) based on local decisions (from social learning) is 
non-trivial. 



B. Motivation and Related Works 

Social Learning: In the last decade, social learning has been studied widely in economics to 
model the behavior of financial markets, crowds and social networks, see 0], ||2), [[121 . Il46ll . 
OTI and numerous references therein. The social learning framework is similar to Hellman's and 
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Cover's seminal papers lfT5l , |[T9l which analyze learning with limited memory. |[T2l Chapters 
3 and 4] gives an excellent exposition of social learning. An important result in social learning 
[[61, |[T0l is that if the underlying state is a random variable and the observation and local 
decision spaces are finite, then agents eventually herd and end up making the same local decision 
irrespective of their observation. Such information cascades have been used in |[T2l to model 
sequences of financial trades, crashes and booms, and auctions. There is strong motivation to 
understand the interaction of local and global decision makers in social learning. Global decision 
making with social learning has recently been studied by several economists; for example [13J, 
0T|, H3, [|45l . Il26ll describe how information externalities affect global and local decision 
making in social learning. The current paper can be viewed as addressing a related problem: if 
individual agents make (simple) decisions by optimizing a local utility, how can the global system 
achieve the (complex) task of detecting a change. In a non-Bayesian setting such problems of 
designing sophisticated global behavior given simple local behavior have also been studied in 
game-theoretic learning |[T8ll . ifTTll . Il27ll involving correlated equilibria. 

PH-distributed change time: This paper deals with quickest detection for PH-distributed change 
times. PH-distributions are used widely in queuing theory ll36l and include geometric distributions 
as a special case. The optimal detection of a PH-distributed change point is useful since the family 
of all PH-distributions forms a dense subset for the set of all distributions, i.e., for any given 
distribution function F such that F(0) = 0, one can find a sequence of PH-distributions {F n , n > 
1} to uniformly approximate F over [0, oo); see ll36ll . Therefore there is strong motivation 
to analyze quickest detection with PH-distributed change times and social learning. Quickest 
time change detection for PH-distributed change times is analyzed in ll26l . The current paper 
generalizes these results to include social learning. A systematic investigation of the statistical 
properties of PH-distributions can be found in IJ36l . 



C. Main Results and Organization 

This paper deals with characterizing the structure of the global quickest-time change detec- 
tion policy in multi-agent systems where individual agents make local myopic decisions when 
performing social learning. The main results and organization of the paper are as follows: 
1. Multi-agent Protocol: SecQl] presents the multi-agent social learning protocol. The quickest 
time detection problem is formulated. We also point out in ((2T)) the difference between the social 
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learning model and the classical Kolmogorov-Shiryaev model for quickest change detection. 

2. Dynamic Programming Formulation and Dominance of Classical Detection: In Sec Jim the 
optimal stopping policy is characterized in terms of stochastic dynamic programming. It is shown 
that the value function is in general non-concave. Also Theorem \T\ uses Blackwell ordering of 
measures to show that the optimal cost incurred in social learning based quickest detection is 
always larger than classical quickest detection. Although such a result might appear intuitive 
(decision making using social learning is based on less information than classical quickest 
detection), the proof is nontrivial. One needs to show that the expected cost of the entire trajectory 
of a stochastic dynamical system (driven by the social learning protocol) is larger than that of 
classical quickest detection. 

3. Main assumptions and Multi-threshold Policies: Sec|IV] starts with the main assumptions 
required to analyze the structure of the optimal quickest detection policy. These assumptions 
allow us to decompose the belief space into polytopes (Theorem O. On each of these polytopes, 
the conditional probability of a local decision given the underlying state and posterior distribution 
is a constant. 

The main result of SecJlV] is to characterize quickest time change detection policies when 
the probability of change, denoted e, is small. When the probability of change equal to zero, 
Theorem |3] characterizes explicitly the multi-threshold structure of the optimal decision policy 
and non-concave behavior of the value function for sequential detection of a fixed state. Then 
Corollary \T\ shows that the optimal quickest-time detection policy for change probability e, yields 
a cost that is within 0(e) of the optimal cost for zero change probability. An important ingredient 
in the proof of this result is characterization of fixed points of the social learning filter update 
(Lemma |2]) which also characterizes regions where the agents form information cascades in 
social learning. 

4. Phase-type Distributed Change Times: The next main result is to is to characterize the optimal 
policy of the global decision maker to achieve quickest time detection when the change time 
has a phase-type (PH) distribution and individual agents are performing social learning. As 
mentioned above, PH-distributions can approximate arbitrary distributions and so are widely 
used in discrete-event systems. 

A PH-distributed change time can be modelled as a multi-state Markov chain with an absorbing 
state, see 11261 and also ll36l for a systematic description. (For a 2-state Markov chain, the PH- 



March 5, 2012 DRAFT 



distribution specializes to the geometric distribution). So for quickest time detection with PH- 
distributed change time, the belief states (Bayesian posterior) lie in a multidimensional simplex 
of probability mass functions. 

Under what conditions will there exist a threshold stopping policy for quickest detection with 
PH- distributed change time and social learning? Under what conditions for the geometric change 
time case does the optimal policy coincide with the classical Kolmogorov-Shiryaev model? 

To answer these questions, the main results of SecjV]are as follows: 
(i) Theorem @] gives sufficient conditions under which the optimal decision policy for the 
global decision maker is myopic and characterized by a linear threshold hyperplane in the 
multidimensional simplex. For the geometric case, this results yields an identical threshold to 
the Kolmogorov-Shiryaev model. 

(ii) Theorem \5\ gives sufficient conditions so that the optimal decision policy is characterized by 
a single switching curve in the multidimensional simplex. The result uses lattice programming 
||49l and structural results involving monotone likelihood ratio stochastic orders ll40ll . Il28l . and 
a novel modification of it. The result is useful because it implies that the global decision to stop 
can be implemented efficiently at each agent. Each agent simply needs to compare its belief state 
with respect to the threshold curve (in terms of a monotone likelihood ratio partial order on the 
space of posterior distributions). Theorem [7] gives sufficient conditions on the optimal linear 
approximation to this curve that preserves the monotone likelihood ratio increasing structure of 
the optimal decision policy. This linear approximation can be estimated via simulation based 
stochastic optimization. 

5. Multi-agent Quickest Time Detection with active sensing: Sec J VII considers multi-agent quickest 
time detection outlined in Example 2 above. We show that the optimal policy is similar to that 
in social learning based quickest detection. 

II. Social Learning Model and Protocol for Quickest Time Detection 

In this section, the multi-agent social learning model is presented in Sec III- Al This consti- 
tutes the local decision-making framework for estimating an underlying state. Then Sec M-BI 
formulates the costs incurred by the global decision maker in quickest time detection. Sec JII-CI 
presents the global quickest time detection objective. Finally, Sec JII-DI summarizes the entire 
social learning quickest detection model. 

March 5, 2012 DRAFT 



A. The Multi-agent Social Learning Model 

Consider a countably infinite number of agentgj performing social learning to estimate an 
underlying state process x. Each agent acts once in a predetermined sequential order indexed by 
k — 1, 2, The index k can also be viewed as the discrete time instant when agent k acts. 

Let y k £ Y = {1, 2, . . . , Y} denote the local (private) observation of agent k and a k G A = 
{1, 2, , . . . , A} denote the local decision agent k takes. Define the sigma algebras: 

H k cr-algebra generated by (d, . . . , a fc _i, y k ), 

G k cr-algebra generated by (oi, . . . , a k -i, a k ). (1) 

The social learning model IfTOl , lfT2l comprises of the following ingredients: 
1. Absorbing-state Markov chain and Phase-Type Distribution Change Times: The state Xk 
represents the underlying process that changes at time r°. We model the change point r° by a 
phase type (PH) distribution. As mentioned in SecU PH-distributions form a dense subset for 
the set of all distributions ll36ll and so can be used to approximate change times with arbitrary 
distribution. This is done by constructing a multi-state Markov chain as follows: Assume the 
underlying state Xk evolves as a Markov chain on the finite state space X = {1, . . . ,X}. Here 
state '1' is an absorbing state and denotes the state after the jump change. The states 2, . . . ,X 
can be viewed as a single composite state that x resides in before the jump. 

The initial distribution is 7r = (n (i),i e X), 7r (^) = P(x = i). We are only interested in 
the case where the change occurs after a least one measurement, so assume 7r (l) = 0. So the 
transition probability matrix P is of the form 

1 



P 



P.{X-l)xl P{X-l)x(X-l) 



(2) 



Let the "change time" r° denote the time at which Xk enters the absorbing state 1, i.e., 

r° = inf{£; : x k = 1}. (3) 

The distribution of the change time r° is equivalent to the distribution of the absorption time to 
state 1 and is given by 

H) = 7ro(l), Vk = <P k ~ 1 P, k>\ (4) 

4 As mentioned earlier, the same setup holds if a finite number of agents are polled repeatedly in some pre-defined order, 
providing each agent picks its local decision based on the most recent public belief. 
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where 7f = [ 7r o(2), • • • , tt (X)}'. So by appropriately choosing the pair (n , P) and state space 
dimension X, one can approximate any given discrete distribution on [0, oo) by the distribution 
{u k ,k > 0}; see ||36l 240-243]. To ensure that t° is finite, we assume states 2,3, ...,X are 
transient. In the special case when x is a 2-state Markov chain, the change time r° is geometrically 
distributed. 

2. Local Observation: Agent's k local (private) observation y k E Y = {1, . . . , Y} is obtained 
from the observation likelihood distribution 

B xy = P(y k = y\x k = x). (5) 

The states 2, 3, . . . . , X are fictitious and are defined to generate the PH-distributed change time 
r°. So states 2, 3, . . . . , X are indistinguishable in terms of the observation y. That is, P(y\2) = 
P( y \3) = ■■■ = P(y\X) for all y E Y. 

3. Private belief: Using local observation y k , agent k updates its private belief n k defined as 

Kk = (*f (*)> * e x )> tT(*) = ^{/(xfc = z)l^fc} = p ( x fc = *l a i. • • -i a k-i,yk), initialized by vr . 

(6) 
Thus the private belief is the posterior distribution of the underlying state given the past local 
decisions and current observation. It is computed by agent k according to the following Hidden 
Markov Model (HMM) filter: 

Trf = T(n k ^,y k ), where T(ir,y) = ^f^r, afay) = l'B y P'n. (7) 

B y = diag (£?!,,, . . . , B Xy ) (X x X diagonal matrix for each y E Y) 

Also Tik-i denotes the public belief available at time k — 1 (defined in Step 5 below). 

4. Agent's local decision: Agent k then makes local decision a k E A = {1,2, ,...,A} to 
minimize myopically its expected cost. To formulate this, let c(i, a) denote the non-negative cost 
incurred if the agent picks local decision a when the underlying state is x = i. Denote the local 
decision X-dimensional cost vector 



c a = c(l,a) c(2, a) ••• c(X,a). 
Then agent k chooses local decision a k greedily to minimize its expected cost: 



(8) 



a k = aOfc-i, Vk) = argminE{c(x, a)\H k } = argmin{c' 7r fc } (9) 

a£A a£A 
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In quickest change detection, since states 2, 3, . . . , X are indistinguishable in terms of observation 
y, we assume that c(2, a) = c(3, a) = ■ ■ ■ = c(X, a) for each a G A. 

5. Social learning Public Belief: Finally agent k broadcasts its local decision a k . Subsequent 
agents k > k use decision a k to update their public belief of the underlying state x k as follows: 
Define the public belief ix k as the posterior distribution of the state x given all local decisions 
taken up to time k. 

ir k = E{x k \G k } = (ir k (i), i E X), n k (i) = P(x = i\a u . . . a k ), initialized by tt . (10) 

Then agents k > k update their public belief according to the following "social learning Bayesian 
filter": 

n k = r^TTfc-i.Ofc), where T*(n,a) = ^1, a fa a ) = l' x R*P'n (11) 

We use the notation T 7r (-) to point out that the above Bayesian update map depends explicitly 
on the belief state ix. (For notational simplicity we have chosen not to use the superscript -k for 
cr(7r, a)). This is a key difference compared to the HMM filter © where the Bayesian update 
map T(-) does not depend explicitly on belief state n. In (fTTT) . B% denotes the diagonal matrix 
Rl = diag(Rl a , i e X) where 

Rj a = P(a k = a\x k = i, n k ^i = n) (12) 

denotes the conditional probability that agent k chose local decision a given state i. We call RJ a 
as the local decision likelihood probabilities in analogy to observation likelihood probabilities 
B iy (0) in classical filtering. 

Clearly observing the local decision a k taken by agent k yields information about its local 
observation y k . That is, a k serves as a surrogate observation of the underlying state x k . The 
following lemma summarizes how subsequent agents use a k to compute the local decision 
likelihood probabilities RJ a in the social learning filter. The proof is straightforward and omitted. 

Lemma 1. The local decision likelihood probability matrix R" in the social learning Bayesian 
filter 07]) is computed as 

R n = BM n where M^ a = P{a\y, n) = I(c a B y P'ii < c- a B y P'-n). (13) 

aeA-{a} 

Here W is a Y x A matrix, B, B y are the private observation probabilities defined in (|3]), (0), 
c a ,Ca are the local cost vectors defined in ((§]), and /(■) denotes the indicator function. ■ 
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The main implication of Lemma Q] is that the social learning Bayesian filter ([111) is discon- 
tinuous in the belief state ir, due to the presence of indicator functions in (fl~3l) . The likelihood 
probabilities R n in (fl"2l) are an explicit function of the belief state n - this is stark contrast to 
the standard quickest detection problems where the observation distribution is not an explicit 
function of the posterior distribution. 

Summary: A key aspect of the information pattern in the above social learning protocol is that 
agent k does not have access to the private belief state tt^-i or private observations of previous 
agents. Instead each agent k only has access to the local decisions taken by previous agents 
together with its own current private observation y k . The fact that the likelihood probabilities 
W is an explicit function of the public belief state n (see (fl"3T) ) is an important aspect of social 
learning that is not present in classical sequential detection problems. It makes the Bayesian 
update of the public belief discontinuous with ir and makes our proofs substantially harder than 
standard concavity arguments in classical quickest detection problems. 

Belief State Space: Before proceeding with the quickest time detection formulation, we briefly 
describe the space in which the public belief ix defined in (fTOl) lives. The public belief belongs 
to the unit X — 1 dimensional simplex denoted as 

n(X) = {vr G R x : l' x n = 1, < it(i) < 1 for all ieX} . (14) 

So for geometric-distributed change times, the belief state space n(2) is the interval [0, 1]. 
For PH-distributed change times, the belief space II(X) is a multi-dimensional simplex. For 
example, 11(3) is a two-dimensional unit simplex (equilateral triangle); 11(4) is a tetrahedron, 
etc. The vertices of the unit simplex H(X) are the unit X-dimensional vectors ei, . . . , ex, where 

ei denotes the unit vector with 1 in the zth position, i G X. (15) 

Of course the private belief n p © also lives in H(X). 

B. Quickest Time Detection: Costs Incurred by Global Decision Maker 

With the above social learning based local decision framework, we now formulate the quickest 
time detection problem faced by the global decision maker. At each time k, given the public 
belief ir k , let u k denote the global decision taken: 

u k = /i(7r fc ) G {1 (announce change and stop), 2 (continue) }. (16) 
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Thus the global decision u k is Qk measurable, where Q k is defined in (OQ). In (fT6l) , the policy 
\x belongs to the class of stationary decision policies denoted /x. Below we formulate the costs 
incurred when taking these global decisions u k . 

(i) Cost of announcing change and stopping: If global decision u k = 1 is chosen, then the 
social learning protocol of Sec JII-AI terminates. If u k — 1 is chosen before the change point r°, 
then a false alarm penalty is incurred. The false alarm event VJi>2{x k = i} D {u k = 1} = {x k ^ 
1} n {u k = 1} represents the event that a change is announced before the change happens at 
time t°. To evaluate the false alarm penalty, let fil(x k = i, u k = 1) denote the cost of a false 
alarm in state i, i 6 X, where /« > 0. Of course, /i = since a false alarm is only incurred if 
the stop action is picked in states 2, . . . , X. The expected false alarm penalty is 

C(ir k , u k = 1) = ^ fi^{H x k = h Uk = l)\Gk} = f 7Tfc, where f = (/i, . . . , f x )', h = 0. 

(17) 
The false alarm vector f is chosen with increasing elements so that states further from state 1 
incur larger penalties. (Obviously f > since f\ = 0). 

(ii) Delay cost of continuing: If global decision u k = 2 is taken then the social learning protocol 
of Sec JII-AI continues to time k + 1. A delay cost is incurred when the event {x k = 1, u k = 2} 
occurs, i.e., no change is declared at time k, even though the state has changed at time k. The 
expected delay cost is 

C(7r kj u k = 2) = dE{I(x k = l,w fc = 2)\G k } = de'jTTk (18) 



where d > denotes the delay cost and t\ is defined in (fT5l) . 

Remarks: (i) Recall that the public belief state 7r depends on the local decisions a. Also the 

choice of global decision u determines when the local decision process terminates. This links 

the local and global decision makers. 

(ii) The above costs (flTT) . (fT8l should be viewed as an example only. The results of this paper also 

apply to more general stopping time problems with minor modifications if the global decisions 

u k are 7i k measurable (instead of Q k measurable), where 7-L k and Q k are defined in ©. More 

generally, C(ir, u) can also include the local decision cost incurred in social learning, see remark 

at the end of Sec JV-Bl 



March 5, 2012 DRAFT 



13 

C. Quickest Time Detection Objective 

Let (Q, J 7 ) be the underlying measurable space where ll = (IxUx Y)°° is the product space, 
which is endowed with the product topology and T is the corresponding product sigma-algebra. 
For any n E n(X), and policy p E n, there exists a (unique) probability measure P^ o on 
(f2, J 7 ), see ll20ll for details. Let E^ o denote the expectation with respect to the measure P^ o . 

Let r denote a stopping time adapted to the sequence of a-algebras Q k , k > 1, see ([I])- That 
is, with u k determined by decision policy (fl6l) . 



r = {mik:u k = l}. (19) 

For each initial distribution tt E n(X), and policy /1, the following cost is associated: 

T-l 

J,(n ) = W„ o {Y,P k ~ l C{<K k ,u k = 2) +p- 1 C(7i T ,u T = 1)}. (20) 

fc=i 

Here p E [0, 1] denotes an economic discount factor. Since C(tt, 1), C(ir, 2) are non-negative and 
bounded for all n E n(X), stopping is guaranteed in finite time, i.e., r is finite with probability 
1 for any p E [0, 1] (including p = 1). 

Kolmogorov-Shiryaev criterion: Suppose X = {1,2} implying that the change time r° is 
geometrically distributed. Choose the false alarm vector f = f 2 e 2 = [0, f 2 ]' where f 2 is a 
positive constant, delay cost (fT8l) . and discount factor p — 1. Then the quickest time objective 
(1201) assumes the classical Kolmogorov-Shiryaev criterion for detection of disorder fl3l : 

J,(7T ) = dEU(T " r °) + > + /» P "o(^ < A (2D 

However, unlike classical quickest detection, the posterior (public belief) n has discontinuous 
dynamics given by the social learning Bayesian filter (fTTI) . (Recall from (fTTI) . (fT3l) that the 
dynamics of public belief 7r depend on the local decision costs c a ). ■ 

The goal of the global decision maker is to determine the change time t° with minimal cost, 
that is, compute the optimal global decision policy p,* E fx to minimize (l20l) . where 

Jfj,*{^o) = inf J M (vr ). 
The existence of an optimal stationary policy p* follows from [|9l Prop. 1.3, Chapter 3]. 



March 5, 2012 DRAFT 



14 

D. Summary 

In summary, the social learning based quickest detection problem with PH-distributed change 
time is specified by the model 

(P,B,c,C,p,X,Y,A,u) (22) 

where P is the transition probability matrix ©, B is the private observation matrix ©, c are 
the local decision costs ®, C defined in (l24l) is the transformed global decision cost vector for 
quickest detection (in terms of false alarm f (flTT) and delay penalty d (fl"8T)). and p E [0, 1] is the 
discount factor ( |20l ). Also X is the state space, Y is the private observation space, A is the local 
decision space and u = {1 (stop) , 2 (continue) } is the global decision space. 

III. Stochastic Dynamic Programming Formulation and Dominance of 

Classical Quickest Detection 

Sec JIII-Al formulates the optimal decision policy for social learning based quickest detection 
as the solution of a stochastic dynamic programming problem. Sec JIII-Bl describes why social 
learning based quickest detection is a non-trivial extension of the standard quickest detection 
problem. Finally, Sec JIII-Cl presents our first structural result - it uses Blackwell dominance of 
measures to show that optimal cost incurred in quickest time detection with social learning is 
always larger than that with classical quickest detection. 

A. Stochastic Dynamic Programming Formulation 

Given the stopping time problem (l20l) . it is well known ll33l that the optimal policy p*(ir) 
can be expressed as the solution of a stochastic dynamic programming problem in terms of the 
belief state n. Our characterization of the structure of the optimal policy /i*(vr) will be based on 
analyzing the structure of this dynamic programming problem. 

The optimal stationary policy p* : n(X) — > {1, 2} and associated value function V(ix) of the 
stopping time problem (l20l) are the solution of "Bellman's dynamic programming equation" 

/i*(7r) = a rgmin{L7(7r,l), C(tc,2) +p £)V (T*(tt, a)) <r(7T, a)}, J^(tt ) = V(tt ) (23) 
V(n) = min{C7(7r, 1), C(n, 2) + p ]T V (T>, a)) a(ir, a)}. 

agA 
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Here the global decision maker's costs C(ir,u) are defined in ( fTTT ), (fT8b . T 71 " is the public belief 
Bayesian update (fTTI) . and the measure cr(7T, a) is defined in (fTTT) . 

For our subsequent analysis, it is convenient to rewrite Bellman's equation as follows. Define 
the transformed value function and global decision costs V(ir), C(ir, 1) and C(ir,2) as follows: 

V(tt) = V(tt) - fV, C(«, 1) = 0, C(tt, 2) = C(n, 2) - f vr + pf'PV = CV (24) 
where C = cfei — (7 — pP)f with elements denoted as Cj, j = 1, . . . , X. 

Then clearly V(7r) satisfies Bellman's dynamic programming equation 

j2*(tt) = argminQ(7r, u), Ju*(^o) = V(ir ), V(ir) = min Q(n,u), (25) 

ueV "£{1,2} 

where Q(tt, 2) = C(tt, 2) + p J^ V (T*(ir, a)) a(ir, a), Q(tt, 1) = C(tt, 1) = 

The above transformatioro is convenient since the transformed stopping cost C(ir, 1) = and 
C(ir, 2) = C'-n in (1241) captures all the costs involved in quickest detection. Of course, the optimal 
policy /i*(7r) and hence stopping set S remain unchanged with this coordinate transformation. 
The goal for the global decision-maker is to determine the optimal stopping set denoted S. That 
is, S is the set of public belief states n for which it is optimal to declare a change and stop: 



S = {vr e U(X) : p*(7r) = 1} = {tt e IL(X) : C(tt, 1) < C(tt, 2) + p ^ V (^(vr, a)) a(vr, a)} 

aeA 

= {vr G n(X) : C(tt, 1) < C(ir, 2) + p J2 V (T*(n, a)) a(vr, a)}. (26) 

aSA 

Vh/we Iteration Algorithm: Let fc = 1,2,..., denote iteration number (the fact that we used 
k previously to denote time should not result in confusion). The value iteration algorithm is a 
fixed point iteration of Bellman's equation (1251) and proceeds as follows: Vq(tx) = —C(tt, 1) and 

Vfc+i(vr) = min Q fc+1 (vr,w), /4 +1 (tt) = argmin„ e{12} Q k+1 (ir,u) tt e U(X), 

uS{l,2} 

where Q k+1 (n, 2) = C(tc, 2) + p J^ V k (T(tt, a)) a(vr, a), Q fc+1 (7r, 1) = C(vr, 1) = 0. (27) 

aeA 

Let B(X) denote the set of bounded real-valued functions on If-(X). Since C(ir,l), C(ir,2), 
it E n(X), are bounded, the value iteration algorithm (ITTI) will generate a sequence of lower 

This transformation is used in 1211 pp.389] to deal with stopping time problems. As a result of this transformation, the initial 
condition of the value iteration algorithm is modified, see J27b . 
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semi-continuous value functions {Vk} C B(X) that will converge pointwise as k — > oo to 
V(n) E B(X), the solution of Bellman's equation, see JH Prop. 1.3, Chap 3, Vol.2] 

Since the belief state space II (X) in (fT4l) is a unit simplex, the value iteration algorithm (|27T ) 
does not yield a practical solution methodology for computing stopping set S since Vk(ir) needs 
to be evaluated on the continuum n E II(X). Although Bellman's equation and the value iteration 
algorithm is not useful from a computational point of view, in subsequent sections, we exploit 
its structure to characterize the stopping set S in (l26l) . We then exploit this structure to devise 
stochastic gradient algorithms for approximating the optimal policy /i* and thus determining the 
stopping set S. 

B. Why Social Learning based Quickest Detection is non-trivial 

Let us illustrate why social learning based quickest detection results in a non-trivial behavior. 
We will show in Sec|IV]that the belief space II(X) can be decomposed into Y + 1 polytopes 
denoted V±, . . . , Vy+i such that on each of these polytopes Vu the belief state update T n (n, a) = 
T l (7i,a). Consider the value iteration algorithm dTTT) which is used as a basis for mathematical 
induction to prove properties associated with Bellman's equation (1251) . It can be expressed aC 

Y+l 

V k+1 (n) = min{C"7r + P^Yl Wfa a ))° r ( 7r ' a ) / ( 7r E V ^ °> 

a 1=1 

Y+l 

= mm{C'n + pJ2Yl Vk(R l a P''rc)I{'K E Vi), 0} (28) 

a 1=1 

It should be clear from (1281) that if Vk(7i) is assumed to be concave on II (X), Vfc + i(7r) is not 
necessarily concave on II (X). In fact, even if Vk(n) is assumed to be concave in just one of 
the polytopes, say polytope Vu then Vfe + i(7r) is not necessarily concave on Vu since T l (-K,a) 
in (T2~8l) may map two distinct belief states in polytope V\ to two different polytopes. As will be 
shown in numerical examples, in general V(tt) will be discontinuous and non-concave. 

Classical quickest detection problems are special instances of partially observed Markov 
decision process (POMDP) stopping time problems lf26l . In POMDPs, the belief state update T n 

6 Note that from d27t . Vk (it) is positively homogeneous, that is, for any a > 0, Vk(air) — aVfc(7r). So choosing a — a(ir,a) 
which is the denominator term of T 71 in jilt yields the expression in the second equality of J28t . 
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is not an explicit function of belief state ix since the observation probabilities are not an explicit 
function of it. For such POMDP stopping time problems the value iteration algorithm readi] 

Y+l 

y 1=1 
and is to be compared with (1281) . Since the composition of a concave function with a linear 

function preserves concavity, it is easily seen that if V_ k (n) is piecewise linear and concave, then 

so is ¥-k+i( n )- So by mathematical induction on the value iteration algorithm, and since the 

sequence {V_ k (7r)} converges pointwise (actually uniformly for POMDPs) to VJjc), the value 

function V_(%) is concave and the stopping set S is a convex (and therefore connected) set 11321 . 

The key difference in the above social learning quickest detection formulation is that the local 

decision likelihoods R w (fl3l) and therefore social learning filter T n are explicit and discontinuous 

functions of n. This results in a possibly non-concave value function V(n) making determining 

S non-trivial. 

C. Quickest Time Detection with Social Learning is More Expensive 

This section presents our first main result. We prove that quickest detection with social learning 
is always more expensive than classical quickest detection. In social learning, agents have access 
to local decisions of previous agents instead of the actual observations. Thus one would expect 
intuitively that this information loss results in less efficient quickest time change detection 
compared to classical quickest detection. Here we confirm this intuition. The main idea is to use 
Blackwell dominance of observation measures. 

1) Notation: First define the optimal policy and cost in classical quickest time detection. 
Similar to (|25l) . the optimal policy p* (tt) and cost V_{^) incurred in classical quickest detection, 
satisfies the following Bellman's equation: 

ff(ir) = argminQ(7r,w), J u *(vT ) = V(n ), V(ir) = min Q(tt,u), (29) 

— ueu — p «e{i,2} — 

where Q(ir, 2) = C(ir, 2) + p ^Y. (T{ir, y)) a(n, y), Q(ir, 1) = C(n, 1) = 

7 We use the notation V_(ir) to denote the value function of the classical stopping problem. This will be defined formally in 
Sec lIII-Cl where we will show V_(n) < V(n), i.e., quickest detection with social learning always incurs a higher optimal cost 
than classical quickest detection. 
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Recall T(ir,y) is the Hidden Markov Model Bayesian filter defined in ©. Thus the only 
difference between the classical and social learning quickest detection problems is the update of 
the belief state, namely © in the classical setup versus (flTT) in the social learning formulation. 
2) Main Result: The following theorem says that if the initial belief state is chosen from any 
of the polytopes V y *, . . . Vy+i, the optimal detection policy with social learning incurs a higher 
cost than classical quickest detection. 

Theorem 1. Consider the social learning quickest time detection problem (P, B, c, C, p) in 
< [22\) and associated value function V(it) in (|25]). Consider also the classical quickest detection 
problem with value function V_(tt) in ft29\) . Then for any initial belief state ix 6 W{X\ the optimal 
cost incurred by classical quickest detection is smaller than that of quickest detection with social 
learning. That is, V_(ir) < V{ji). 

Since the theorem holds for the case A = Y = 2 (equal number of local decision choices and 
observation symbols), a naive explanation that information is lost due to using fewer symbols 
in A compared to Y is not true. 

The proof of Theorem \T\ is given in Appendix |B] Recall from (TT3T) that R w = BM n where 
B and M n are stochastic matrices. Thus observation y with conditional distribution specified 
by B is said to be more informative than (Blackwell dominates) observation a with conditional 
distribution K* ', see ll40l . The main idea in the proof is that under the assumptions of Theo- 
rem [Q the value function V_(n) is concave for % e n(X). Then the result is established using 
Jensen's inequality together with Blackwell dominance on the Bellman's equation, value iteration 
algorithm proves the result. 

The first instance of a similar proof using Blackwell dominance for POMDPs was given in 
[|50ll . see also ll40l . where it was used to show optimality of certain myopic policies. Our use 
of Blackwell dominance in Theorem \T\ is somewhat different since we are using it to compare 
the value functions of two different dynamic programming problems. A useful consequence of 
Theorem \T\ is that performance analysis of standard quickest detection problems ll48l readily 
applies to form a lower bound for the cost incurred in social learning based quickest detection. 
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IV. Assumptions and Quickest Detection with Small Change Probabilities 

This section comprises of two parts, 
(i) Sec JIV-Al lists the main assumptions (Al), (A2), (S) which result in a natural partition of 
belief space II (X) into Y + 1 convex polytopes with decision likelihoods R n (defined in ([731) ) 
being a constant (with respect to n) on each poly tope (Theorem [2]). These polytopes play an 
important role in specifying the global quickest detection policy in the rest of the paper, 
(ii) Sec JIV-Bl considers quickest time change detection with geometric distributed change time 
and gives explicit conditions for the optimal policy to have a double threshold. In particular, 
Theorem [3] and Corollary \T\ show that the optimal quickest-time detection policy for change 
probability e, yields a cost that is within 0(e) of the optimal cost for sequential detection of a 
constant state. 

A. Polytope Structure and Main Assumptions 

Since the public belief state ix E U.(X) is continuum (see (fl4l)). as a first step in character- 
izing the optimal policy //(V), we need to understand the structure of the decision likelihood 
probabilities R n defined in (fl"3l . Even though the belief state ir e II (X) is continuum, it turns 
out that there are only 2 y — 1 possible local decision likelihood probability matrices R w . Let Q h 
I = 1, . . . , 2 y — 1 denote the elements of the power set of Y (excluding, of course, the empty 
set). Define the following 2 Y -- 1 convex polytopes Pi, I — 1, 2, . . . , 2 Y — 1: 

, (ci - c 2 )'B y P'ix < yeQi 
Vi={i:e n(X) : { } (30) 

(ci - c 2 )'B y P'7r > yeY-Qt 



Recall the local cost vectors c a are defined in ([8]). Then from (1131) it follows that M n and hence 
R* is a constant on each polytope Q t . Specifically, for rows y G Qi, M yl = 1 and for rows 
yeY-Q h M; 2 = l. 

Although in general there are 2 Y — 1 possible i?"^ matrices, we now show that by introducing 
assumptions (Al), (A2) and (S) below, there are only Y + 1 distinct local decision likelihood 
matrices R n . This forms an important preliminary step for characterizing the optimal global 
decision policy. 

Recalling the notation in Sec JII-A[ we list the following assumptions. 
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(Al) The observation distribution B xy = p(y\x) is TP2 (see Definition [6] in Appendix lAl). i.e., 

all second order minors of matrix B are non-negative. 
(A2) The transition probability matrix P is TP2. (All second order minors of P are non-negative). 
(A3) The elements of vector C in (1241) are decreasing. A sufficient condition is that for j > i 
and i > 2 the false alarm vector f and delay penalty d satisfy fa > max{l, pf'P'ei — d} 
and f 3 -fi> pi'P\e d - e,). 
(S) The local decision cost vector c a in ([8]) is submodular. That is, the elements c(i, a) satisfy 
c(l,2) > c(l,l) and c(2, 2) < c(2, 1). (Recall from Sec|IFA]that c(2, a) = c(3, a) = ■■■ = 
c(X, a) in quickest detection problems with PH-distributed change time). 
Discussion of Assumptions: 
Assumption (Al): The requirement that P(y\x) is TP2 with respect to states {1,2} and y E Y 
holds for numerous examples, see Karlin's classic book ll22l and also ll2~3~l . Examples in- 
clude quantized Gaussians, quantized exponential distributions, Binomial, Poisson, etc. For ex- 
ample consider quantized Gaussians. Suppose B iy = P(y\x = i) = y ' y ^ where b iy = 



i QV ^ / i (y-gi) 



— exp I -i^ 2 - 1 , E > 0, and g x < g 2 . Then (Al) holds. 

Assumption (A2) always holds trivially for X = 2. For X > 2, see [fT6ll . 11251 for numerous 
examples. Consider the tridiagonal transition probability matrix P with p^ = for j > i + 2 
and j < i — 2. As shown in lfT6l pp. 99-100], a necessary and sufficient condition for tridiagonal 
P to be TP2 is that PijPi+i^+i > Pi,i+iPi+i,i. Such a diagonally dominant tridiagonal matrix 
satisfies Assumption (A2). 

Assumption (A3) is a sufficient condition for C(tt, 2) to be decreasing in n with respect to 
the monotone likelihood ratio order. We will use (A3) in SecjV] to obtain sufficient conditions 
for a threshold policy. Assumption (A3) always holds for the geometric distributed change times 
(X = 2). For PH-distributed change times (X > 2), Assumption (A3) can be viewed as design 
constraints the decision maker needs to take into account so that quickest detection with PH- 
distributed change times has a threshold policy ||26l . Feasible values for the elements of f are 
straightforwardly obtained using a LP solver such as linprog in Matlab. 

Assumption (S) is only required for the problem to be non-trivial. If (S) does not hold and 
c(i, 1) < c(i, 2) for i = 1,2, then local decision a = 1 will always dominate decision a = 2 
and the problem reduces to a standard quickest detection problem where the observed local 
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decision a = 1 yields no information about the state. Assumption (S) implies c(x, 2) — c(x, 1) 
is decreasing in x E {1,2}, i.e., the local cost c(x,a) is submodular which implies the zero 
crossing condition that is important in the proof of Theorem |2l 

The following theorem is an abbreviated version of Theorem [2] presented in Appendix O 
It will be used in the rest of the paper as a natural partition of the belief state space IT(X). 
Recall that transition probability P, observation probability matrix B y and local cost vector c a 
are defined in ©, ©, © respectively. 

Theorem 2. Under (Al), (A2), (S), the belief state space II (X) defined in (O can be partitioned 
into at most Y + 1 non-empty polytopes denoted Pi, ... , Vy+i where 

V 1 = {ne U(X) : (d - c 2 )'B l P'u > 0} (31) 

Vi = {tt 6 n(X) : ( Cl - c 2 )'B l _ 1 P'7i < D (ci - c 2 )'B l P'ir > 0}, Z = 2, . . . , Y 

Vy+i = {nE U(X) : ( Cl - c^'ByP'i: < 0} 

On eac/z 5mc/? polytope, the local decision likelihood matrix R n defined in (22]) w a constant 
with respect to belief state re. ■ 

As a consequence of Theorem [2] and (fT3l) . there are only Y + 1 possible decision likelihood 
matrices i? 7 ", one per polytope "P;, I = 1, . . . , Y + 1. We will denote these decision likelihood 
matrices as 

R l = iT = BM ! = BF, 7T G ^, / = 1, . . . , Y + 1. (32) 

Example: To give some insight into the structure of decision likelihood matrix R n , suppose 
X = 2 (state space), Y = 3 (observation space), A = 2 (local decision space). Then assuming 
(Al), (A2), (S), by Theorem [2] there are up to Y + 1 = 4 convex polytopes. The matrices M l 
defined in (O, G2& are 



(33) 



M 4 



1 




1 




1 




1 


1 


, M 3 = 


1 


, M 2 = 


1 


, M 1 = 


1 


1 




1 




1 




1 



Then from (T32l) the 4 possible decision likelihood matrices R l are 



7? 1 



1 


,R 2 = 


1 





B\\ B\2 + Bi 3 

BiX B 2 2 + B 2 3 



R 3 



B\\ + B\2 B 13 
B21 + B 2 2 B 2 3 



,i? 4 = 


1 




1 



(34) 
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P-, u 




V,i 



1 e -l 



(a) X = 3, Y = 4, A = 2 



03 04 (15 OtT 

' t(2) 



(b) X = 2, y = 4, A = 2. 



Fig. 2. Illustration of polytopes V\,Vi, Vz^Va defined in J3U and hyperplanes 771, 7/2, 773 defined in J35t for Y — 4, A — 2. 
Theorem [2] ensures that the hyperplanes do not intersect within the simplex T1(X) and on each polytope, the local decision 
likelihoods R w are a constant. In the figure, e2,es £ Vi - Assumption (PH)(ii) in Sec(V] ensures this. 



The detailed version of Theorem [2] in Appendix O guarantees that each of these matrices is TP2. 
Figf2] illustrates these polytopes and hyperplanes rj y defined below. 

Let us give some intuition behind Theorem [21 Define the following Y hyperplanes that are 
subsets of n(X): 



T) y = {vr e U(X) : (d - c 2 )'B y P'n = 0}, y = 1, . . . , Y. 



(35) 



The main intuition of the above theorem is that (Al), (A2), (S) imply that (ci — c^'ByP'^ 
satisfies a single crossing condition [|4]| with respect to a, y, see Definition [5] in Appendix [A] 
This means that the set of belief states satisfy the following subset property: 



{vr : ( Cl - c 2 )'B y P'Ti > 0} C {vr : ( Cl - c 2 )'B y+l P'Ti > 0}. 



(36) 



This implies that the hyperplanes r) y , y E Y, do not intersect within the simplex Ii(X). It is 
nice that straightforward conditions such as (Al), (A2), (S) ensure this. Otherwise dealing with 
intersecting hyperplanes in a multi-dimensional simplex can be a real headache. Theorem [2£iv) 
in Appendix O shows that each hyperplane 7] y partitions H-(X) such that vertices ei, e 2 , ■ ■ ■ , e% y 
lie on one side and e^ +1 , . . . , ex he on the other side. In SecJV] we will introduce Assumption 
(PH)(ii) which ensures that e 2 , . . . , ex always lie in polytope V\ as illustrated in Figf2l 
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B. Multi-Threshold Structure of Social Learning based Quickest Detection 

The main result (Theorem [3] and Corollary Q]) below gives sufficient conditions under which 
social learning based quickest detection has a double threshold policy. Consider the model 
(P, B, c, C, p) in (l22l) with geometric change time: 

r i 

e 1-e 

with f = f = (0, $2)' false-alarm vector in (flTT) and delay cost (fl"8~l) . Here the change probability 
f C 1 is a small non-negative scalar. So the change time r° is geometrically distributed with 
E{r } = 1/e. 

The analysis in this subsection proceeds as follows: 
Step 1: For e = 0, the problem becomes a simple sequential detection problem for state 1 - we 
explicitly characterize the multi-threshold behavior of the optimal decision policy in Theorem [3] 
below. 

Step 2: It is then shown that for small e, the optimal value function is within 0(e) of the value 
function for the case of zero change probability (Corollary [T]). So, the optimal policy computed 
for zero change probability yields performance that is close to that of the optimal quickest 
detection policy for small e. 

1) Step 1: Sequential Detection of State 1: In line with above plan, consider the sequential 
detection problem for state 1 with social learning formulated in Sec|ll] with 

X = Y = A = {1,2}, P = I. (38) 

The state a; is a random variable chosen at k = with distribution ir and remains constant for 
k > 0. The goal is to detect and announce state 1 if x = 1 based on noisy observations. The 
global decision uu = M 71 ^) e {1 (stop) , 2 (continue)} is a function of the public belief ix^. The 
optimal policy [i*{j\) that optimizes (T2~0l) satisfies Bellman's equation (T25T) . 

The 2-dimensional belief state n = [1 — 7r(2), 7r(2)] is parametrized by the scalar n(2) E [0, 1], 
i.e., n(X) is the interval [0, 1]. Each hyperplane 7] y d35l) now is a point on the interval [0, 1]; 
let the 2-dimensional vector [1 — r]y(2),r)y(2)] denote the belief state corresponding to r\ y . The 
polytopes V\, V2, V3 in Theorem [2] are now intervals which are subsets of [0,1]. If (A 1) and 
(S) hold, then V 3 = [0,77 2 (2)), V 2 = [r/ 2 (2), Vl (2)), V x = [771(2), 1]. 
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m ^-^ q ^-^ m 




ei v 3 Kj^jvr * 



Fig. 3. Structure of social learning filter under the assumptions of Lemma [2] and symmetric B. Right (left) arrows represent 
evolution of the public belief when a = 2 (a — 1). As can be seen, r/i, r\i are fixed points of the composite maps in J40b . 



To handle the discontinuity in the social learning filter (fTTI) . we start with the following lemma 
that characterizes useful structural properties of the social learning filter. First define the belief 
state 

q = T r *(r h ,l). (39) 

Lemma 2. Consider the social learning filter 4771) and assume (Al), (S) hold. Then: 

(i)q = T^( Vl ,l) = T^( V2 ,2). 

(ii) If B is symmetric, then r\\ and r]2 are fixed points of the composite Bayesian map: 

r}i=T*(T*{rii,l),2), 7)2 = T«((T*(ri2, 2), 1) (40) 



The implication of the above lemma is that H(X) can be partitioned into 4 intervals, namely 
[ei.,772), [V2,q), [q,Vi) an d b?i,e 2 ]. Figj3] illustrates these regions and the dynamics specified 
in Lemma [2l The main result below characterizes the structure of the optimal global decision 
policy jit*(7r) on these 4 intervals. The theorem also characterizes information cascades |[T2l 
(more colloquially "herding") which is a salient feature of social learning. 

Theorem 3. Consider the sequential detection problem with parameters rfiffl) . Suppose agents 
make local decisions via social learning. Assume (Al), (S) hold. (Note (A2) holds trivially since 
P = I). The optimal global decision policy ji* (ix) has the following properties: 
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(i) For it G V\ U V3, the global decision policy has a threshold structure: 

[2 l/7T(2) >7T*(2) w 

/x'(tt) = wferc vr*(2) = (41) 

I 1 otherwise j2 ^ ' ' 

Also for n eVi UP3, z7ie value function rt25l) z's V(7r) = min{0, C(7r, 2)/(l — p)} where C(it, 2) 

Z5 defined in < [24\) . 

(ii) The intervals V\ and V3 are "information cascades " fiT2\l . That is, if 71^ G V\ U V3, then 

7Tfc + i = TTk and social learning ceases. 

(Hi) If B is symmetric, then for n G V2, the global decision policy has the following structure: 

(a) For ir G [772(2), ?(2)), V(ir) is concave and there is at most one interval where fi*(ir) = 1. 

(b) For ix G [g(2), 771(2)), V{n) is concave and there is at most one interval where jj*{h) = 1. 



The implication of Part (iii) of the above theorem is that the stopping set S comprises of 
at most three intervals. One of these intervals is (tt*(2), 1), with the threshold 7r*(2) defined in 
(I4TT) . The second claim of the theorem follows, since if public belief n E V\, then the optimal 
local decision is a = 2 irrespective of the observation y. Similarly, if ix G V3, then the optimal 
local decision is a = 1 irrespective of the observation y. Therefore when the public belief is 
in V\ U V3, the local decision of an agent reveals no information about its local observation to 
subsequent agents. 

2) Step 2: Quickest Time Detection bound for small e: Given the characterization in Theorem 
[3] of the optimal policy for e = 0, we now consider the quickest change detection problem for 
small e specified in (1371) . It is convenient to introduce the following e dependent notation. 

Let V# 



7r) denote the cost incurred by the optimal policy p* with transition matrix P e = 
We use the notation Vf to denote the explicit dependence of the 3 intervals V\, Vi, 



1 

e 1-e 
V3, defined in (I3TT) . For e = 0, we denote these intervals as Vf. The following result bounds 

the difference between V^*(7r) and V^*(ix). Note that hI(tt) is characterized in Theorem [3] and 

P° — I (identity matrix). 

Recall from (1231) that V(ir) is the actual optimal expected cost associated with optimal decision 

policy p*(n). As mentioned below (T25l) . the transformed value function V(ir) is more convenient 
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to deal with to prove the existence of optimal threshold policies and the optimal policy remains 
invariant to the transformation from V(it) to V(ir). 

Corollary 1. Consider the social learning based quickest detection model (P, B,c,C, p) in ri22l) 
with probability of change specified in rfT/T ). Then, for initial belief n E Vf fl V®, I = 1, 2, 3, 
the optimal policy p$ (characterized in Theorem (TJ])J incurs a total global cost V^* (it) that 
constitutes an 0(e) upper-bound to the optimal global cost V^*(it) incurred in the quickest 
detection problem. More specifically, for ir E Vf fl Vf, I = 1, 2, 3, 

V^iir) ~ V^ic) < -Jf^ m ax(d,f 2 ). ■ (42) 

Discussion: The implication of (l42l) is that the simple policy p*q(tt) of Theorem [3] is near 
optimal for quickest time detection with social learning when e is small. Note that ((42)) compares 
the optimal costs in regions n E Vf (1 Vf, I = 1,2, 3, so we are omitting intervals where the 
models have different local decision likelihood probabilities R n . The regions we are omitting 
are 0(e) in size. In each region tx E Vf HP; , the only difference between the quickest detection 
model and the simplified model is the transition matrix (P e vs P°). This allows us to give a 
tight bound in the sense that for e = 0, the optimal costs V^(tx) and V^{tx) coincide. Of course, 
(142)) requires the discount factor p < 1. We refer the reader to [48] for an alternative and more 
general approach. 

The proof of Corollary \T\ follows from Theorem 2 of I1421 . In terms of our notation, Theorem 
2 of [42] shows that for a POMDP with piecewise linear value function at each iteration of the 
value-iteration algorithm, for it E Vf fl Vf, 

V^{<k) < V^tt) + ^^HOMIUsup ||[P e - P% R% (43) 

where the || ■ ||i induced matrix norm is with respect to the (j, a) elements. Since from Theorem[3l 
the value function is piecewise linear, (1431) applies. From the structure of P e in (1371) and since 
P° = I, clearly 

sup \\[P e - P% R% = emax(B n + B 21 , B 12 + B 22 ) < 2e. 

i 

Also ||0(7r, m)||oo = max(<i, f 2 ). Substituting these in (1431) yields the bound (l42l) . 
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(a) Optimal global decision policies (j,q(tt) and /x*(7r) 
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(b) Value functions for global decision policy 



Fig. 4. Optimal decision policy for quickest time change detection with social learning for geometric distributed change time 
with small probability of change. In each sub-figure, the graph with solid lines is for e — 0.005 and the graph with broken 
lines is for e = 0. The policies and optimal costs are very close for e = 0.005 and e = 0. Equation d42t gives a bound for the 
difference in the optimal costs. In both cases, the optimal policies are a double threshold and the value functions are non-concave 
and discontinuous. 



D 



p = 0.8, d=1.8, h 



2. 



(44) 



3) Numerical Example: Consider the social learning quickest detection model (P, B,c,C, p) 
withX = Y = A = {1,2}, 

0.85 0.15] [ 1 2 

, c = 
0.15 0.85 -1 -3.57 

FigJH shows the optimal policies p* G (Theorem [3]) and p* (optimal quickest detection policy) 
together with optimal costs V^*(tt) and V^*{ii) for change probability e = 0.005. As can be 
seen the quickest detection optimal policy and costs are very close to the costs and policies 
specified by Theorem [3] For e = 0.002 the policies p* G and p* are almost identical and cannot 
be distinguished in FiglU The policies and optimal costs were obtained by running the value 
iteration algorithm for horizon 500 with n(X) = [0, 1] discretized to a grid of 100 points. 



V. Quickest Time Detection for Geometric and PH-distributed Change Time 

The previous section illustrated the multi-threshold behavior of social learning based quickest 
time change detection. What sufficient conditions on the social learning model lead to single 
threshold behaviorl This section gives such conditions for PH-distributed change times r° 
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modelled by a X > 2-state Markov chain. For geometric change times (i.e., X = 2) these 
conditions yield a threshold that is identical to the classical Kolmogorov-Shiryaev criterion (l2"TT) . 

This section comprises of the following results, 
(i) Sec lV-AI gives sufficient conditions for the optimal global decision policy /i* to be myopic 
and characterized by a linear hyperplane threshold. 

(ii) Sec lV-BI gives less restrictive conditions under which the optimal policy is increasing with 
respect to the monotone likelihood ratio (MLR) order and is characterized by a single threshold 
curve. Recall that for PH-distributed change time, the belief space II (X) is a multi-dimensional 
simplex. To order posterior distributions on this simplex, the MLR stochastic order (which is a 
partial order) will be used since it is preserved under conditional expectations. The results involve 
analysis of the structure of the social learning Bayesian filter together with lattice programming. 
All definitions of these orders and consequences are given in the Appendix, 
(iii) Sec JV-CI describes how sufficient conditions can be given for multiple-threshold policies, 
(iv) Finally, Sec lV-DI characterizes the optimal linear approximation to the MLR increasing 
policy. It then formulates estimation of the optimal linear approximation to the threshold curve 
as a stochastic optimization problem. 

Assumption (PH): Recall fictitious states 2, . . . , X (corresponding to belief states e 2 , . . . , ex) 
are used to model the PH-distribution in ©. It therefore makes sense to constrain the model 
parameters so that the global decision policy /x*(7r) at the belief states e 2 , ■ ■ ■ , ex are identical 
(and similarly for the local decisions taken in social learning). Throughout this section, when 
considering PH-distributed change times, we make the following assumption. 
(PH) (i) C'et < for i = 2, . . . , X. (ii) e 2 ,...,e x lie in polytope V x . 
Assumption (PH)(i) says that the optimal policy \x*{tx) treats each of the fictitious states 2, . . . , X 
identically - they all lie outside the stopping set S. In similar vein, (PH)(ii) requires that 
individual agents making local decisions treat the fictitious states i = 2, . . . ,X identically, i.e., 
they lie to the left of each hyperplane rj y , y = 1, . . . , Y. 

Obviously, (PH) holds trivially for X = 2 (geometric case) - otherwise the quickest change 
problem would be degenerate. 
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A. Case 1: Myopic Quickest Detection with Linear Hyperplane Threshold 

The main result of this subsection is Theorem |4] which shows that under suitable conditions, 
the optimal policy p*(ii) has a myopic structure characterized by Oix = 0. Recall that C in ((24)) 
denotes the transformed costs of the global decision maker with elements Cj, j = 1, . . . ,X. 
Denote the X — 1 vertices of the intersection of the linear hyperplane {n : C'n = 0} with the 
facets of simplex Ii(X) as Vj, j = 1, ... ,X — 1. Then it is straightforwardly seen that these 
vertices are 

uj = Cj+1 r ei - C ' ei+ \ j = l,...,X-l. (45) 

Now introduce the following assumption: 
(CI) C'Ra 3 P'u 3 > for all a 6 A, j = 1, . . . , X - 1. 

The relevance of (CI) is apparent from the following lemma (proof in Appendix E]). Define 
the set of belief states (polytope) 

S = {n:C'ix> 0} (46) 



Lemma 3. (CI) together with (Al), (A2), (A3), (PH) are sufficient for the set S defined in < \46\) 
to be closed under the social learning filter ([77]). That is n G S =^> T n (n,a) e S for all 
a e A. 

Recall (Al), (A2), (A3) were introduced in Sec JIV-AI and (PH) at the beginning of Sec|V] 
The main result is as follows. The proof is in Appendix [0 

Theorem 4. Consider the social learning based quickest time detection model (P, B,c,C, p) in 
( 1221) . Assume (Al), (A2), (A3), (S), (CI), (PH), Then the global decision maker's optimal policy 
is myopic and is of the form 

{1 (stop) if C'-k > 

, S = {k : C'tt > 0}. (47) 

2 (continue) otherwise 

For the special case X = 2 (geometric change time), 

1 (stop) if7r(2)<7T*(2) d 
**•(*) , when vr*(2) = -— — — (48) 

2 (continue) ifn(2) > tt*(2) d + M1 " P ^ 22) 
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Fig. 5. Illustration of Theorem [4] The shaded region which depicts the polytope {tt : C'tt > 0} is equivalent to the stopping 
set S under the assumptions of the theorem. 



The above result is similar to the entry fee optimal stopping problem having a myopic policy 
discussed in ET1 pp.389] and PT1 Theorem 2.2, pp.54]. It is important to note, however, that even 
though the optimal policy jj*(n) in (l47b is characterized by a linear threshold, the value function 
V(ir) can still be discontinuous and non-concave (unlike classical stopping time problems). This 
will be illustrated in the numerical example below. 

Let us illustrate what Theorem @] says. Consider Fig|5] The shaded region in Figj5] denotes the 
set S = {n : C'n > 0}. It is clear from Bellman's equation (1251) that the stopping set S is a subset 
of this shaded region S. What Theorem |4] says is that the stopping set is equal to the shaded 
region, i.e., S = S, if (CI) and (PH) hold. In terms of Figj5l (CI) is sufficient for T*(n,a) 
to map the belief states v\ and z/ 2 (which are the vertices of the line C'n = 0) to polytope S. 
(PH)(i) implies that states e2, e?, lie to the left of the line C'n = (which corresponds to the 
region C'n < 0). Similarly, (PH)(ii) means that e 2 , e 3 lie to the left of each line segment r) y , 
y = 1,2, i.e., e 2 ,e 3 G V\. 

Numerical Example: To illustrate Theorem HI consider the geometric change time model in 
(144b except that P 2 2 = 0.75. Even though the sufficient condition (CI) does not hold, the optimal 
policy is characterized by a single threshold given by (l48~b . This is shown in Figj6l As can be 
seen in Figj6[ the value function is non-concave and discontinuous. 
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0.4 0.5 0.6 0.7 0.8 

t(2) 



(a) Optimal global decision policy /x*(tt) 




(b) Value functions for global decision policy 



Fig. 6. Numerical example illustrating Theorem [4] which characterizes the optimal decision policy for social learning based 
quickest detection. The example is described in Sec lV-AI Even though the value function is non-concave and discontinuous, the 
optimal policy has a single threshold specified by d48t . 



B. Case 2: Existence of a single threshold switching curve 

In this subsection, we consider another special case of the social learning based quickest 
detection model (l22l) . Theorem \5\ below shows that the stopping set is characterized by a single 
threshold curve on the belief space. The threshold coincides with the classical quickest time 
detection problem with non-informative observations. For PH-distributed change times, unlike 
the previous subsection, the threshold curve is not necessarily linear. We give a stochastic gradient 
algorithm to estimate this threshold curve in Sec lV-DI 

1) Structural Result: We make the following assumptions. Recall the global decision maker's 
cost vector C is defined in (l24l) . Let Vj, j = 1, ... ,X — 1 denote the X — 1 vertices of the 
intersection of hyperplane r\y (defined in (1351) ) with II (X). These vertices are computed as (|45T) 
with C replaced by PB Y {c\ — c 2 ). 
(C2) (ci - c 2 )'B Y {P'fu, < for j = 1, . . . , X - 1. 
(C3) The linear hyperplane {n : C'n = 0} lies in polytope Vy+i- 

The following is the main result. The proof is in Appendix [FJ 

Theorem 5. Consider the social learning based quickest detection model (P,B,c,C, p) in j[22\) . 
Assume (Al), (A2), (S) and (PH) hold. The optimal policy //*(7r) has the following structure 
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(i) Under (C3), /x*(tt) = 2 for it ^ V x . 

(ii) Under (C2) and (C3), the stopping set S is as convex subset of polytope Vy+i- Therefore 
the boundary of S is differentiable almost everywhere. 

(Hi) For geometric-distributed change time (X = 2), under (C3), the optimal policy is identical 
to that of the Kolmogorov-Shiryaev criterion ri271) with uniformly distributed observation prob- 
abilities, 
(iv) Under (A3), (C2), (C3). on the polytope Vy+i, fJ>* (tt) has the following structure: 

7Ti,7T2 G Vy+i and tx\ > r 7r 2 implies //(7Ti) > /U*(7r 2 ) (49) 

(The MLR order > r is defined in rf671) in Appendix |A|). Hence the boundary of the stopping 
set S within U.(X) intersects any line segment l(e\, ft) or /(ex, ft) at most once (see geometric 
interpretation below). ■ 

Even though the policy /i*(7r) in Theorem [5] coincides with that of classical quickest detection, 
the optimal cost incurred is always larger as shown in Theorem [Q 

2) Discussion of Theorem\5\and assumptions: Assumption (C3) localizes the decision thresh- 
old to polytope Vy+i- As a consequence of (C3), C'n < on all polytopes except Vy+i- 
Therefore on these polytopes, fi*(n) = 2. Thus statement (i) is obvious. 

Assumption (C2) together with (Al), (A2), (S) and (PH) ensures that the polytope Vy+i is 
closed under the belief state mapping T n (n, a). That is, n G Vy+i implies T 7r (7r, a) G Vy+i for 
all a. Note that Assumption (C2) holds trivially for X = 2 as shown in the footnote^ (C2) is 
similar in spirit to (CI) of the Sec lV-Al -the key difference is that (CI) deals with the global cost 
vector C whereas (C2) deals with local costs c±,C2. 

Assumptions (C2) and (C3) allow us to show that the value function V(ir) is concave on 
Vy+i- Then Statement (ii), namely convexity of the stopping set S, follows from arguments in 

Statement (iii) is straightforward to show. The local decision likelihood probabilities on Vy+i 
are uniform since the local decision yields no information about the state. Thus under (C3) the 

8 For X — 2, the second element of P'it is P22K2 which is always smaller than 7T2, So applying P to any belief state keeps 
it within the interval Vy+i- 
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threshold is identical to the classical quickest detection threshold for the Kolmogorov-Shiryaev 
criterion (|2TI) with uniformly distributed observation probabilities. 

The proof of Statement (iv) is more involved and is given in Appendix [0 The proof uses 
structural properties of the Bayesian social filter studied in Theorem [10] of Appendix [0, along 
with submodularity, MLR stochastic order and a version defined on line segments /(ei,7f) and 
l(ex,Tt) in Appendix lAl 

3) Geometric Interpretation of Statement (iv): Since Statement (iv) is non-trivial, let us explain 
what it says from a geometric point of view. For PH-distributed change times with X > 2, 
Statement (iv) says a lot more than convexity of the stopping region S. On the unit simplex 
II(X) define l(ei, n) as the line segment constructed from ei to any point tx G {e 2 , . . . , ex) on 
the opposite facet of the simplex H(X). Similarly denote /(ex, 7f) as any line segment from ex 
to any point tx on the opposite facet (ei, . . . ,ex-i)- Statement (iv) implies that the boundary 
of the stopping set S within If-(X) intersects any such line Z(ei,7f) or /(ex,7f) at most once. 
FigjT] shows examples of convex sets that violate this condition. Also Statement (iv) leads to the 
following nice geometrical interpretation. If a belief state it E S lies on a line I (ex, vr), then all 
belief states on this line closer to 7f also lie in S. Similarly if a belief state n e l(ei,7t) lies 
outside the stopping set S, then all belief states on the line /(ei,7f) further away from e\ also 
lie outside the stopping set. 

Numerical examples are given in Sec J VIII 

C. Extensions of Theorem \5\ and Multi-threshold Policies 

1) Local and Global Costs in Global Decision Making: Theorem \5\ can be extended to 
consider a more general global decision maker's cost function (instead of only false alarm and 
delay) which takes into account the cost of local decisions in social learning. For example, 
suppose that the global decision maker's cost for picking decision u = 2 (continue) is the delay 
cost plus an "operating cost". That is, 

C(tt, 2) = de\7r + /3C op (tt) (50) 

Here /3 > is a user defined constant and with er(-), T(-), % defined in ©, CD, 

C op (tx) = Ey{mmE{c(x, a)\n k }} = y^mm{c' a T(ii,y)}cr(ii,y) = V" min c a B y P'iT. (51) 

aGA *— — ' agA *- — * aGA 

y£Y j/SY 
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Fig. 7. The stopping set S in Fig (a) violates Statement (iv) of Theorem \5\ since the boundary of stopping set S within H(X) 
intersects the line l(e3,7t) twice. Fig(b) shows an example of a stopping set S that satisfies Statement (iv) of Theorem [5] In 
both figures, the region to the right of line r/y is the polytope 7-V+i. 



C op (7r) is the expected operating cost since it is incurred at each agent k when it makes its 
local decision via social learning. Note C op (iv) is the expected local cost from choosing decision 
u — 2, receiving signal y, picking recommendation a and broadcasting the information to the 
network: the probability of the event is a(n, y) and the cost is min a c' a T(ii, y). The last equality 
in (ISTT) follows since cr(7r, y) is a non-negative scalar independent of a. Actually, the above 
choice of C op {ti) is very similar to that used in constrained social learning in |jT2, Chapter 4]. 

Then using the same transformation as in (T24l) . the optimal policy is given by the Bellman's 
equation ([25]) with C(n, 2) = C'ix+C op (ix). Assumption (C3), namely, {vr : C(ir, 2) = 0} G V Y +i 
is then equivalent to the linear hyperplane (C + Pci)'n = lying in polytope Vy+i- This is 
because on polytope Vy+i the optimal local decision a — 1, see (|34|) . and so C p(7r) = c[P'n. 
Suppose Assumption (A3) is augmented with the condition that c(i,a = 1) is decreasing with 
i. Then Theorem [5] continues to hold. 

2) Multiple Thresholds: Using a similar proof to Theorem [5l sufficient conditions can be 
given for the optimal global policy n*(ir) in social learning-based quickest detection to have 
multiple thresholds. We describe this below. 

Suppose the hyperplane C'n = lies in polytope V y * for some y* E {1, 2, . . . , F+l}. Assume 
(C2) holds. Also assume the following generalization of (C2) holds. 
(C2') The social learning filter maps belief states in polytope V y to polytope V y+ i for y = 
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y*, y* + 1, . . . , Y. That is, n eV y implies T*(ir, a) e V y+1 . 
Then similar to the proof of Theorem [51 the following result can be established (proof omitted). 

Theorem 6. Under (Al), (A2), (A3), (S), (PH), (C2), (CT), the value function V(ir) is MLR 
decreasing and therefore optimal policy fi*(n) is MLR increasing on each polytope V y , y = 
y*,...,Y + l. 

As a result, /i*(vr) is characterized by up to Y + 2 — y* threshold curves, one on each of 
these polytopes. The reason is that even though V(ir) is decreasing in each polytope, there is no 
guarantee that is decreasing between polytopes. Theorem \5\ is a special case of the above result 
when y* = Y + 1 and therefore /i* (n) is characterized by a single threshold curve. 

As an example, consider X = 2,Y = 2, A = 2 and suppose C'n = lies in P 2 , i-e-, y* = 2. 

Since X = 2 (geometric change time), conditions (A2), (A3), (PH) and (C2) hold trivially. (CT) 

holds if the social learning filter T 7r (-) maps the belief states in V2 to V3. A sufficient condition 

for this is T m (r]i,2) 6 V3, i.e., the transition matrix satisfies 

p > B 12 (eg, 1) - c(2, 2))B 21 B l2 - (c(l, 1) - c(2, l))B u B 22 
22 -B 22 B n ~ (c(2,l)-c(2,2))S 22 -(c(l,l)-c(l,2))5 21 

If (Al) and (|52|) hold, then according to the Theorem [6l the optimal policy jj*(tx) is monotone 

decreasing on each interval V 2 and V3. So /x*(7r) is characterized by up to 2 thresholds, one in 

each of these intervals. 

D. Optimal Linear Decision Threshold and Algorithms 

Theorem [5j showed that under conditions (Al), (A2), (A3), (S), (PH), (C2), (C3), the optimal 
decision policy /i*(vr) was MLR increasing in belief state ix 6 Vy+i- In this section, we 
characterize linear threshold hyperplanes that preserve this MLR structure. Such linear thresholds 
can then be computed via a stochastic approximation algorithm. For geometric distributed change 
time r°, since the thresholds are points, estimation is an obvious special case. 

Throughout this section we assume that the conditions of Theorem [5j hold. 

1) Characterization of MLR increasing linear threshold: For n G Vy+i, define the X — 1- 
dimensional parameter vector 9 = (9(1), . . . , 9(X — 1))'. Since H(X) C IR X_1 , a linear hyper- 
plane on n(X) is parametrized by X — 1 coefficients. Define the linear threshold policy [Xe(n) 
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parametrized by the vector 9 as 

fstop=l ifn(2) + j:? =1 2 9({)n( i + 2)<6(X-l) 

He(ir) = < (53) 

I continue = 2 otherwise. 

Assume conditions (Al), (A2), (A3), (S), (PH), (C2), (C3) hold for the quickest detection 
problem (l20l) so that from Theorem [51 the optimal policy /i*(vr) is MLR increasing on lines 
/(ex, 7f) and l(e 1 , it). These are defined in Appendix [Al The requirement that state 1 lies in the 
stopping set, means /ie(ei) < which implies 9(X — 1) > 0. 

Theorem 7. For belief states n G n(X), the linear threshold policy ^ei^) defined in ft53\) is 
(i) MLR increasing on lines l(e x , vr) iff ' 9(X — 2) > 1 and 9(i) < 9(X — 2) for i < X — 2. 
(ii) MLR increasing on lines Z(ei, 7f) iff 9{i) > 0, for i < X — 2. ■ 

The proof of Theorem [7J is in Appendix [Gl The constraints in the above theorem are necessary 
and sufficient for the linear threshold policy (l53l) to be MLR increasing on lines l(ex,Tt) and 
/(ei,7f). Under these constraints, (1531) defines the set of all MLR increasing linear threshold 
policies on Z(ex,7f) and l(ei,Tt) - it does not leave out any MLR increasing polices; nor does 
it include any non MLR increasing policies. In this sense, optimizing over the space of MLR 
increasing linear threshold policies yields the optimal linear approximation to threshold curve. 

The conditions imposed on the linear threshold parameters 9 in Theorem [7] have a nice inter- 
pretation when X = 3. Recall in this case n(X) is an equilateral triangle. Let (w(l), w(2)) denote 
Cartesian coordinates in the equilateral triangle. So 7r(2) = 2cu(2)/v / 3, ir(l) = u(l) —tu(2)/\/3. 
Then the linear threshold satisfies 

So the conditions of Theorem [7J require that 0(1) > 1, i.e., the threshold has slope of 60° or 
larger. When 0(1) > 2, slope becomes negative, i.e., more than 90°. 

Figj8] shows examples of a valid and invalid linear threshold. Figj8£a) illustrates a valid MLR 
increasing linear threshold policy. Figj8tb) is invalid since the threshold is less than 60° meaning 
that the resulting policy is not MLR increasing on lines. Also shown is the hyperplane C'n = 
which by Assumption (C3) lies in polytope Vy+i- 
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(a) Valid 



(b) invalid 



Fig. 8. Fig(a) illustrates a valid MLR increasing linear threshold policy. The linear threshold policy in Fig (b) violates the 
requirement that the policy fie(^) is MLR increasing since it has a slope less than 60°. In both figures, the region to the right 
of line tt\y is the polytope Vy+i- In the figures, C denotes the hyperplane C'n — which lies in Vy+i by Assumption (C3). 



2) Computation of Optimal Linear Threshold: As a consequence of Theorem [7J the optimal 
linear threshold approximation to threshold curve Y of Theorem [5] is the solution of the following 
constrained optimization problem: 

9* = arg min J Mfl (7r ), subject to < 0(z) < 8(X - 2), 8(X - 2) > 1 and 6(X - 1) > 

(54) 
where the cost J m (tto) is obtained as in (l20l) by applying threshold policy fig in (|53l) . 

Because the cost J Me (7r ) in (l54l) cannot be computed in closed form, we resort to simulation 
based stochastic optimization. Let n — 1,2 ... , denote iterations of the algorithm. The aim is to 
solve the following linearly constrained stochastic optimization problem: 

Compute 9* = argminE{J n (/i e )} subject to < d(i) < 6(X - 2), 8(X - 2) > 1 and d(X - 1) > 0. 

(55) 
Here, for each initial condition 7r , the sample path cost J n (^e,^o) is evaluated as 

oo 

J n (//e,7r ) = y^ j p k ~ 1 C{'K k ,Uk) where Uh = fio(iTh) is computed via (1531) (56) 

fc=i 

1 L 
Jnil^e) = -j 2_j Jri(ne,TTQ ') where prior tt ' is sampled uniformly from simplex IT(X). 



i=i 



A convenient way of sampling uniformly from TL(X) is to use the Dirichlet distribution (i.e., 
TTo(i) = Xi/J2 { Xi, where Xi ~ unit exponential distribution). 
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The above stochastic optimization problem is solved by stochastic approximation algorithms 
such as the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm ll47l which 
converges to a local minimum; see ||26l for a novel parametrization that deals with the hy- 
persphere constraints. The stochastic gradient algorithm converges to local optima, so it is 
necessary to try several initial conditions. The computational cost at each iteration is linear in the 
dimension of 9 and is independent of the observation alphabet size Y. Convergence (w.p.l) can 
be established using techniques in ll29l . Il30ll . More sophisticated methods than SPSA can also be 
used. For example, uses the score function method to perform gradient-based reinforcement 
learning. These algorithms are applicable to solve the constrained stochastic optimization problem 
(1551) . Also, if the change time distribution (specified by P) and the observation likelihoods 
(specified by B) are not completely specified, as long as the assumptions Theorem [5] hold, then 
the reinforcement learning algorithms can be used to solve (1551) . 



VI. MULTI- AGENT QUICKEST TIME DETECTION WITH ADAPTIVE SENSING 

As mentioned in Sec|H the social learning protocol is very similar to multi-agent quickest 
time detection with a sensor manager (controller). Motivated by sensor network applications, 
this section describes the formulation and the main results. The information patterns are similar 
to social learning and so the results developed in previous sections apply. The observations now 
can also belong to a continuum. 

Consider a countable number of agents indexed by k = 1,2,.... Each agent acts once in a 
predetermined sequential order indexed by k — 1,2,... as follows: Based on the current belief 
state Tik-i, agent k acts as follows: 

• Agent k first chooses decision u k G {1 (stop) ,2 (continue)}. If the agent decides to stop, 
then as in earlier sections, a false alarm penalty is paid, and the problem terminates. 

• If agent k chooses u k = 2, then it chooses its operating mode a k G {1,2} according to a 
built-in micro-manager. Agent k then views the world according to this mode - that is, it 
obtains observation y k from a distribution that depends on mode a k . It then communicates 
its belief state n k to the next agent. 

Remark: An equivalent formulation is as follows: A single smart sensor adapts its operating mode 
a k at each time k based on the posterior distribution of the underlying state at the previous time 
instant. How can quickest detection be achieved with this sensor? 
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How can such a network of agents, where each agent makes autonomous micro-management 
decisions on its mode, achieve quickest time detection? The quickest time detector can be viewed 
as a macro-manager that operates on the belief states and micro-manager decisions. Clearly 
the micro and macro-managers interact - the local decisions a k taken by the micro-manager 
determines y k which determines n k and hence determines decision u k+1 of the quickest time 
macro-manager. 

A. Micro-manager for Agent Mode Selection 

1) Costs and mode selection: As in ([8]), let c a denote the local cost of deploying sensor mode 
a E A = {1, 2}. To avoid trivial solutions, as in Sec JIV-Al we make the submodular assumption 
(S). 

Similar to the social learning formulation, the micro-manager picks local decision a k myopi- 
cally as follows: Based on the belief state Kk-x of the previous agent, each agent k picks its 
mode a k E A = {1, 2} of which sensor to deploy by minimizing its expected predicted cost: 

a k = arg min E{c(x k , a^Tk-i} = arg min c' a P' 7r fc _i (57) 

ae{l,2} o6{l,2} 

where T k denotes the filtration a(yi,l < k). Define the convex polytopes V\ and V 2 that partition 
U(X) as 

V 1 = {vr : (ci - c 2 )'P'u > 0}, V 2 = {vr : (ci - c 2 )'P'n < 0} (58) 



Then from (1571) it follows that for n E V±, a k = 2 and for n EV 2 , a k = 1. 

2) Mode dependent observations: The agent then makes an observation y k depending on its 
choice of mode a k . Based on its mode a k in (1571) . agent k then obtains an observation from 
conditional probability distribution 

P(y k < y\x k = e x , a k = a) = J] B<J, x E X, a E {1, 2}. (59) 

y<y 

Here ^ denotes integration with respect to the Lebesgue measure (in which case YcK and B xy 
is the conditional probability density function) or counting measure (in which case Y is a subset 
of the integers and B xy is the conditional probability mass function B xy = P(y k = y\x k = x)). 
The key point is that unlike classical quickest detection, each agent now views the world based 
on its selected mode a k . 
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Fig. 9. The figure illustrates the setup in Sec lVIl The mode dependent observation probabilities B^ a \ a £ {1,2} are chosen 
depending on the belief state tv in poly tope V-i or Vi defined in d58t . The aim is to perform quickest detection given this mode 
dependent observation probability constraint. 



Let T^ (tt, y) denote the belief state update if mode a is chosen and measurement y obtained. It 
is given by the HMM filter © with mode dependent probabilities By = diag(P(y\x 1 a),x £ X). 
That is, 

T^(ii,y) = B^P'iT/a(7T,y), a(ir,y) = l'B^P'ir. (60) 



B. Macro-Manager for Quickest Time Detection 

Below we present the assumptions and main result. Based on the above micro-manager proto- 
col, the aim is to perform quickest time change detection. So the quickest detection problem can 
be viewed as optimizing the cost function (|24j) subject to the constraint that the belief state evolves 
according to (l60l) . The setup is identical to that in Sec M-BI and Sec HIT Al For k < r, agents 
choose u — 2 (continue) and at k — r, agent k picks Uk — 1 ( declares a change and stop) . 
The optimal policy h*(tv) of the macro-manager satisfies Bellman's equation (1251) . 

The following theorems mimic the results for the social learning based quickest detection 
problem, and their proofs are identical. 

1) Blackwell Dominance: Suppose the mode dependent observation matrices are of the form 
£?0) = BQ^ where B and Q^ a \ a = 1,2 are stochastic kernels. Then an identical proof to 
Theorem [Q shows that classical quickest detection with observation matrix B always yields a 
lower cost than mode dependent quickest detection with observation matrices B", where the 
mode a is chosen according to any arbitrary strategy. 
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2) Threshold Policies: Consider the following assumptions that are similar to (CI) in Sec JV-AI 
and (C2) in Sec lV-BI Recall vertices Vj are defined in (|45T) and 9j denote vertices of hyperplane 
(ci - c 2 yP'ir = 0. 
(CI) If {it : C'tx = 0} lies in one of the polytopes V a , then C'B { ^P'u 3 > 0, j = 1, . . . , X - 1 

for all i/ G Y. 
(C2) ( Cl - c^yP'B^P'uj <0,j = l,...,X-l,y€Y. 

We have the following result regarding the structure of pi* (ir) for quickest time detection. 

Theorem 8. Theorems \4\ and \5\ hold for the optimal quickest time decision policy p,*(n) of the 
macro -manager. Also Theorem [7| holds for MLR policies and computation of the optimal linear 
threshold can be formulated as the stochastic optimization problem d5o1) . 



(Cl) and (C2) are relatively easy to check even if y G Y is continuum as shown below. For all 
x, let ?/ max denote the maximum support of the distribution Bxy , i.e., y max = sup{y : BzJ > 0}. 

Lemma 4. (Cl), (C2) hold if their inequalities hold for y = y max . 

Thus only a finite number of inequalities need to be verified. In particular for a Gaussian 
distribution, since y max = oo, the filter T^ 1 \ir,oo) becomes the Bayesian predictor P'ix. So it 
suffices to check that C'P'uj > for (Cl) to hold. 

Proof: Consider (Cl). C'B^P'vj > is equivalent to verifying C'BJ i a) P , v j /cr(u j ,y) > 
since a(ic,y) is non-negative for all n G n(X). So we need to check that C'T^(uj,y) > for 
all y G Y. But since P and B are TP2 according to Assumptions (Al), (A2), from Theorem 
[I0l4) in Appendix|FJ the belief state update T^ a \ii, y) is MLR increasing in y. Moreover by (A3) 
C has decreasing elements. Therefore from Result Q] in Appendix lAl C'T^ a > (ir , y) is decreasing 
in y. So it suffices to check that C"T (a) (^, y max ) > 0. ■ 

VII. Numerical Results 

In addition to the numerical examples presented earlier, this section presents two numerical 
examples. The first example illustrates the multiple threshold policies inherent in social learning 
(this example was mentioned in SecU)- The second example illustrates the optimal threshold 
curve for a PH-type distributed change time that was proved in Theorem [5J 
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Example 1. Geometric Distributed Change Time: This examples illustrates the existence 
of a triple threshold policy for quickest time change detection when the change time t° is 
geometrically distributed. We chose the social learning model with parameters X = {1,2} (so 
n(X) = [0, 1] is a one dimensional simplex), Y = {1, 2, 3}, A = {1,2}, 



B 



0.9 0.1 
0.1 0.9 



E{t } = 20 =$► P 



1 





, c = 


0.05 


0.95 





■1 



2 
-3.57 



For the global quickest time detection parameters we chose p = 0.99, delay d = 1.25, false 
alarm vector f = 3e2 (i.e., f 2 = 3). It is easily checked that (Al), (A2) and (S) hold. 

The optimal policy //(7r) is shown in Figfjja) an d comprises of a triple threshold policy. It was 
computed by constructing a uniform grid of 500 points for 7r(2) e [0, 1] and then implementing 
the value iteration algorithm (l27l) for a horizon of N = 200. The 'x' in FigHJa) and (b) are the 
values of 772(2), q(2) and 771(2), respectively. 

Example 2. Phase Distributed Change Time: This examples illustrates Theorem [5] which 
proved the existence of a single threshold curve for social learning based quickest time change 
detection with PH-distributed change time. We model the PH-distribution via a 3-state Markov 
chain. So the belief space II (X) is a two dimensional simplex (equilateral triangle) and can be 
visualized easily. 

We chose the social learning model with parameters X = {1,2,3} Y = {1,2,3,4,5}, A = 
{1,2}. The observation probabilities and local decision costs were chosen as 

P 1 ,j / ocexp(-(y-l) 2 /6) 

, c={c(i,a)) 

B 2 , y = B 3>y oc exp(-(y - 5) 2 /6) 

The global costs for quickest detection in (flTT) and (fl"8l) were chosen as d = 1.5, f = [0 20 25]' 
and discount factor p = 0.9. 

The PH-distributed change times were modelled by the 3 state Markov chain with transition 
probability P. To illustrate the quickest time detection, we chose 4 candidate transition probability 
matrices, namely, 
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Fig. 10. Plots of change time r° probability mass function Vk in © for P"> (geometric distribution) and P' 2 ', P^ 3 ' (phase-type 
distributions). 



Note P^ models the geometric distribution since states 2 and 3 are indistinguishable - in fact 

_ r i o 

it is exactly lumpable 112411 into the 2 state Markov chain with transition matrix 

0.1 0.9 

FigGO] plots the probability mass function u k (see ©) of the PH-distributed change time r u 
for these four transition matrices for 7f = [0.03, 0.97]'. Fig{10] shows these PH-distributions are 
quite different in behavior to a geometric distribution - they are non-monotone and have heavier 
tails. 

It is easily checked that (Al), (A2), (A3), (S), (PH), (C2) and (C3) of Theorem [5] hold. FigE] 
shows the optimal decision policies for these four cases with the stopping set S shaded. The 
optimal policy was computed as follows. A 50 x 50 grid of (7r(l),7r(2)) values was formed 
within the 2-dimensional unit simplex II (X). Then the value iteration algorithm (l27l) was solved 
for horizon N = 200 (in all cases || V^ooC 71 ") — ^199(ti")||oo < 10~ 15 implying that the value 
iteration algorithm converged). In all 4 cases, the optimal decision policy is characterized by a 
single threshold curve in polytope TV This is consistent with Theorem [51 

In each plot of Fig{TT] also shows the hyperplanes C'n = (defined in (l24l) ) and 7/ 5 (defined 
in d35l) . The polytope Vq is to the right of hyperplane 775. The remaining line segments from left 
to right are 771, ... , 774. Note that hyperplane Cir = lies in Ve, thereby satisfying Assumption 
(PH) and (C3). 
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(a) P« 




(c) P^ 




(b) P (2) 




(d) P (4) 



Fig. 11. Optimal decision policy for quickest time change time with geometric probability mass function for geometric 
distribution (transition probability P"'), and phase-type distributions (transition probabilities P' 2 % P 1 - 3 ' and P' 4 '). The shaded 
region depicts the stopping set 5 in l |26| >. The parameters are specified in Example 2 of Sec lVIII 



Actually cases P^ and P^ satisfy Assumptions (CI), and (PH) and so Theorem |4] holds. 
Therefore, for these two cases, the optimal threshold curve is the linear hyperplane C'n = as 
can be seen in FigJTTl 

VIII. Conclusions 

Motivated by understanding how local and global decision making interact, this paper has 
presented structural results for quickest time detection when agents perform social learning. 
Also a related model incorporating multi-agent sensor scheduling and quickest time detection was 
considered. Unlike classical quickest detection, the optimal policy can have multiple thresholds. 
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Four main results were presented. First, Theorem Q] showed using Blackwell dominance of 
measures that social learning based quickest detection always results in more expensive cost 
compared to classical quickest detection. Second, for symmetric observation probabilities and 
geometric change times, the explicit multi-threshold behavior of social learning based quickest 
detection was characterized in Theorem [3] by approximating with a simpler detection problem. 
Third, quickest time change detection for more general PH-type distributed change times was 
considered. Theorem |4] gave sufficient conditions for the optimal policy to be characterized by a 
single linear hyperplane in the multi-dimensional simplex of posterior distributions. Finally, using 
lattice programming and likelihood ratio dominance Theorem [5] gave sufficient conditions for the 
optimal policy to be characterized by a single switching curve. The optimal linear approximation 
to this curve (that preserves the MLR monotone nature of the policy) was characterized in 
Theorem U\ 

The results of this paper are straightforwardly extended to more general stopping problems 
where the underlying Markov state does not have an absorbing state, as long as the transition 
matrix satisfies assumption (A2). In current work, we are using similar social learning models 
for "order-book" trades in agent based models for algorithmic market making, see also ll38l . 



Appendix 

A. Preliminaries: Stochastic Dominance, Submodularity 

Excellent background references for stochastic dominance and lattice programming are ll49ll , 
[|25l . 11351 . [|23l . The proofs of Theorem [2] and Theorem \5\ require concepts in stochastic domi- 
nance. In particular, Statement (iv) of Theorem [5] states that the optimal social policy fJ,*(ir) is 
monotonically increasing in belief state n. In order to compare belief states n and ft, we will 
use the monotone likelihood ratio (MLR) stochastic ordering and a specialized version of the 
MLR order restricted to lines in the simplex Ii(X). The MLR order is useful for social learning 
since it is preserved after conditioning HOl . E3l . Il35i 

Definition 1 (MLR ordering, ll3~5l pp. 12-15]). Let 7C\, n 2 G n(X) be any two belief state vectors. 
Then tc 1 is greater than 7r 2 with respect to the MLR ordering - denoted as TTi > r ^2, If 

ttiW^O') < fl2(i)*i0')> * < j,i,j € {1,---,X}. (61) 
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Definition 2 (First order stochastic dominance). Let 7Ti, it 2 E II (X). Then ix\ first order stochas- 
tically dominates iv 2 - denoted as it\ > s iv 2 - if J2i=j n i(^) — J2i=j ^(i) far j — 1, . . . , X. 



Result 1 ( 113511 ). (i) rc\ > r n 2 implies iii > s ix 2 . (For X = 2, > r and > s are equivalent) 

(ii) Let V denote the set of all S dimensional vectors v with nondecreasing components, i.e., 

Vi < v 2 < ■ ■ -Vx- Then t\\ > s ix 2 iff for all d£V, 1/717 > v'-r 2 . 

(iii) Suppose f > g it i = 1, . . . , X and f, gi are increasing in i. Then n > s n implies J2i fi n i — 

J2i Qi^i- (This follows since from (ii) ^. g^ > £\ Qi^i and J2i h^i > J2i 9i^i since f > g t 

Vi). 

For state-space dimension X = 2, MLR is a complete order and coincides with first order 
stochastic dominance. For state-space dimension X > 2, MLR is a partial order, i.e., [II(X), > r ] 
is a partially ordered set (poset) since it is not always possible to order any two belief states 
7i E U(X). 

Finally, we define a modification of the MLR order on certain line segments in the simplex 
which yields a total ordering. 

Define the set of belief states Hi = {ix E n(X) : 7r(z) = 0}. For each belief state n E Hi, 
denote the line segment Z(e,j, n) that connects n to e^. Thus 

l(e h 7f) = {tt E U(X) : tt = (1 - e)7r + ee h < e < 1}, tt E Hi. (62) 

Definition 3 (MLR ordering >l x and >l x on lines), 7Ti is greater than tx 2 with respect to the 
MLR ordering on the line l(ei, tt) - denoted as t\\ >l x n 2 iffti, vr 2 E /(ei, 7f) for some n E Hi 
and iii > r ix 2 . Similarly, 7ri >l x n 2, if ^1,^2 E l(ex,n)) for some 7f E Hx, and ~K\ > r n 2 . 

Note that [n(X), > il ] is a chain, i.e., all elements tt,tt E l(e 1 ,7i) are comparable, i.e., either 
tt >Lj 7f or 7f > Ll tt. Similarly [U(X),>l x ] is a chain. In Lemma [51 we summarize useful 
properties of [II(X), > Ll ] that will be used in our proofs. 

Lemma 5. Consider [n(X),> r ], [l(ex,n), >L]]- (i) On [il(X),> r ], e\ is the least and ex is 

the greatest element. On [l(ex, tt), >lJ> tt is the least and ex is the greatest. 

(ii) Convex combinations of MLR comparable belief states form a chain. For any 7 £ [0,1], 

7T < r 7T" ==>- 7T < r 77T + (1 — 7)^ < r 7T. 

(Hi) All points on a line l(ex,^) are MLR comparable. Consider any two points 7r 7l ,7r 72 E 

March 5, 2012 DRAFT 



47 

/(ex, 7f) d62l) where 7r 7 = 7e^ + (1 — 7)7?. 77?en 71 > 72, implies 7r 71 > ix tt 72 . A similar result 
holds for Z(ei, 7f). 

Definition 4 (Submodular function 11491 ). / : /(ei,7f) x {1,2} — ► R w submodular (antitone 
differences) if f(n, u) - f(n, u) < f(it, u) - f(jt, u), for u < u, n > Ll it. 

The following result says that for a submodular function Q(ir,u), u*(n) = argmin u Q(7r, u) 
is increasing in its argument iv. This implies fi*(n) is MLR increasing on the line segments 
l(e x , it), which in turn will be used to prove the existence of as threshold decision curve. 

Theorem 9 (" 114910 . If f : l(e±, it) x {1, 2} — >■ R is submodular, then there exists a 

u*{rc) = argmin u6 | 12 -j. f(jr, u), that is increasing on [l(ei,it), >lJ, i-e., it >l x n =^> u*(n) < 

u*(it). 

Definition 5 (Single Crossing Condition [|49l , JH). g : Y x A — > R satisfies a single crossing 
condition in (y, a) if g(y, a) — g(y, a) > implies g(y, a) — g(y, a) > for a > a and y > y. 
Then a*(y) = &rgmm a g(y,a) is increasing in y. 

Definition 6 (TP2 ordering and Reflexive TP2 distributions). Let P and Q denote any two 

multivariate probability mass functions. Then: 

(i) P > Q (f-P(i)Q(j) < -P( J V j)<3(i A j). If P and Q are univariate, then this definition is 

equivalent to the MLR ordering P > r Q defined above. 

(ii) A multivariate distribution P is said to be multivariate TP2 if P > P holds, i.e., P(i)P(j) < 

TP2 

P(iVj)P(iAj). 

(Hi) Tjf 1, j G {1, . . . ,X} are scalar indices, Statement (ii) is equivalent to saying that a M x N 
matrix A is TP2 if all second order minors are non-negative, if i > j, then the i-th row of A 
MLR dominates the j-th row. 

B. Proof of Theorem [7] 

Let K fc (7r) denote the value function at iteration k of the value iteration algorithm (|27T) 
associated with the classical quickest detection Bellman equation (l29"i Recall V k (ir) is the value 
function associated with the social learning based quickest detection problem (|25|) . 

We start with the following lemma which is proved at the end of Appendix |B] 

March 5, 2012 DRAFT 



IS 



Lemma 6. E^C^O^M* ,<*) > £ v Z*(r(?r,J/)M?r,J/). 

The proof of Theorem \T\ then follows by mathematical induction using the value iteration 
algorithm ([27]). Assume ^(tt) > V k {n) for vr e IL(X). Then 

C(ir, 2) + J2 V k (T*(n, a))cr(ir, a) > C(n, 2) + ^Z fc (^(vr, a))a(n, a) 

a a 

>C(n,2) + J2V k (T(7r,y))a(7r,y) 
y 

where the second inequality follows from Lemma [6J Thus Vfc + i(7r) > V_ k+l (ix). This completes 
the induction step. Since value iteration converges pointwise, V(rr) > V_(tt) thus proving the 
theorem. 

Proof of Lemma IS 
Step 1: First, let us show that V_ k {^) is concave over II (X) for any k by induction. Recall from 
(1271) that V_o( n ) — —C(tt, 1) which is linear in n G n(X). Assume V_ k (%) is concave at iteration 
k. Note that V_ k (k) is positively homogeneous, i.e., for any c > 0, V_ k {cir) = cV_ k (w). So the 
value iteration algorithm (|27T) associated with Bellman's equation (T2~9l) is 

V k+1 (n) = mm{C>n + pJ2V k (B y P>7r),0} 

y 
Since the composition of concave function with a linear function preserves concavity, therefore 

YlyY-kiByP'Tt) is concave and so V_ k+1 (7c) is concave. 

Step 2: We then use the Blackwell dominance condition (TT3T) . The social learning filter (fTT|) 

can be expressed in terms of the Hidden Markov Model filter © as 

T*{ic,a) = Y j T(n,y)^p^-P(a\y,n) and a(n,a) = $>(7T, y)P(a\y, vr). 
z — ' O" (7T, a) ^ — ' 

Therefore, ^r'^ P(o|y, 7r) is a probability measure wrt y. Since from Step 1, ]£ fe (-) is concave 
for Ti £ n(X), using Jensen's inequality it follows that 

implying ^ ^(^(tt, a))a(vr, a) > ^ V^(T(7r,y)cr(7r,y). 
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C. Proof of Theorem [2] 

Here we present a detailed version of Theorem [2] that was presented in Sec JIV-Al 

Theorem H (Detailed version). Under (Al), (A2), (S), 

(i) The local decision a*(n,y) = arg min a c' a B y P'7i (see ©j is increasing in y. 

(ii) a*(7r,y) is MLR increasing in tx, i.e., n > r t\ ==>- a*(ir,y) > a*(7t,y). 

(Hi) The Y linear hyperplanes (c\ — C2)'B y P'iT — 0, y — 1, . . . ,Y do not intersect within the 

interior of the belief space II (X). Thus, out of the 2 Y poly topes in 0Q|) . there are a maximum 

of Y + 1 non-empty polytopes in II, namely 071) . 

(iv) Let i* = max{j : e^ G {n : (c\ — c^'ByP'-n < 0}. Then each of the Y hyperplanes 

(ci — C2) ' ByP'n = 0, y = 1, . . . , Y partitions n(X) such that the vertices e±, e^ . . . , ei* lie in 

the convex polytope (c\ — C2)'ByP'ix < and the vertices ej* + i, . . . , ex He in the convex polytope 

(ci - c 2 )'B y P'TX > 0. 

(v) iy decreases with y. 

(vi) AP defined in 4731) has the following structure: 



M w 



ly-i+i Oy-i+i 



forii£V h l = l,...,Y + l (63) 



Proof: (i) From (33l Lemmal.2(l)] if B and P are TP2 (i.e., (Al), (A2) hold) then ^f p \ < r 

B v+1 P'n 



l'B y+1 P'ir- 



Next MLR dominance implies first order stochastic dominance. Then since from 
(S), c(i, 1) - c(i,2) is increasing in 1, it follows that (ci 1 , c |ffi 7 f'" < ^w^r 1 - Since the 
denominators are non-negative, this implies that (ci — c^'ByP'ix > ^> (ci — C2)' B y+ iP'ix > 
0. That is, the single crossing condition (|36l holds, see Definition [51 So a*(7r,y) j" y. 

(ii) To prove a*(n,y) f tt wrt > r , we use a similar approach to Part (i). From |[3~3l Lemma 
1.2(2)], assuming (A3), n < r tx implies v ^ p, < r VE \ p, . As in the proof above, using (S) 
this implies ^rlpf- < ^g P 4 ?r - Since the denominators are non-negative, this implies 
that 

n < r tt and ( Cl - c 2 )'B y P'7T > ^ (c x - c 2 )'B y P'Tx > 0. (64) 

That is a single crossing condition (see Definition [5j holds wrt (n,a) and the partial order > r . 

So a*(ir,y) 1 7T. 
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(iii) follows immediately from (|36l) . 

(iv) Since i* y = max{z : e^ G {it : (c\ — c^'ByP'ix < 0}, clearly ei* + \ G {n : (c\ — 
C2)'ByP'ix > 0}. Next since ej» +1 < r ej. +2 • ■ ■ < r ex, the single crossing condition (|64l) yields 
( Cl - c 2 )'B y P'e il+2 > 0, . . . , (ci - c 2 )'B y P'e x > 0. 

(v) Start with the single crossing condition (l36l) repeated below for clarity: 



{vr : (ci - c 2 )'B y+1 P'<K < 0} C {vr : ( Cl - c 2 )'B y P'n < 0} 
Therefore max{i : e» G {7r : (ci — c 2 ) / 5 2/+ iP / 7r < 0} < max{i : e« G {7r : (c\ — C2)'ByP'n < 0} 

(vi) follows by enumerating all matrices M 71 " that satisfy (i) and (ii); see (1331) for an example. 



D. Proof of Theorem \3\ 

Similar to the example given below Theorem |2l it can be verified from (TT3T ) that there are 
only 3 possible values for R 71 , namely, 



R n 



1 
1 



,irePi, R 7T = B,neV 2 



R w 



1 
1 



and ii G V% 



(65) 



Thus Bellman's equation (1251) . reads 



V(n) = min{C , (vr, 2) + pV(vr) J(tt G Pi) + p ^ ^(T*^, a))a(vr, a) J(tt G P 2 



y (tt) = min{C(7r, 2)/(l - p), 0} implying /i*(vr) 



+ pV(7r)J(vrGP 3 ),0} (66) 
Claim (i): For 7r G Pi U P3, V(7r) = min{C(7r, 2) + pV(7r), 0}. This can be solved explicitly as 

1 C(tt,2)<0 

2 C(tt,2)>0 

Since C(tc, 2) is MLR decreasing in n, the optimal policy for ir G Pi U P 3 is a threshold policy 
with threshold at C(7r*, 2) = 0. This proves the first claim of the theorem. 
Claim (ii) Since T 7r (7r,a) = ir for ix G Pi U P 3 , the private belief state update © freezes in 
these regions, i.e., iik-i G Pi U P 3 implies that , r\ k = TXk-i- Therefore all agents take the same 
local decision a according to © implying an information cascade. 
Claim (iii) The proof of this is more involved. 
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We need the following property of the social learning Bayesian filter which is a detailed version 
of Lemma [2] in Sec lIV-Bl Since we are going to partition If-(X) into four intervals, namely 
[0,772(2)), [772(2), 9(2)), [(7(2), 7/1(2)) and [7/1(2), 1], it is convenient to introduce the following 
notation: Denote these intervals as P 1 ,P 2 ,P 3 ,P 4 , respectively. Note V\ = P\, V2 = P2 U P3, 
V 3 = P 4 . 

Lemma |2] (Detailed version). Consider the social learning Bayesian filter rfTTT) . Then T Vl (771, 1) = 
q, T' ?2 (772,2) = q. Furthermore if B is symmetric TP2, then T q (q,2) = rj 1 , T q (q,l) = r] 2 and 

r}2 <r q <r Vi- So 

(i) n eP 2 implies T*(n, 2) e Pi and T^(vr, 1) e P 3 . 

(ii) 7T 6 P 3 implies T^vr, 2) G P 2 and T n (n, 1) e P 4 - ■ 

The proof of Lemma [2] is as follows. Recall from (1651) that on interval P2 5 R n = B. Then it is 
straightforwardly verified from ([II]) that T vi (r]i, 1) = T m (r]2, 2) = g. Next, using (dB it follows 
that B 12 B n = B22B21 is a sufficient condition for T q (q, 2) = 771 and T q (q, 1) = 772. Also, since by 
(Al) B is TP2, applying Theorem [I0l2). implies 772 < r q <r r]i. So B symmetric TP2 is sufficient 
for the claims of the lemma to hold. Statements (i) and (ii) then follow straightforwardly. In 
particular, from Theorem EJl), 771 > r n > r q implies T^ 1 ^, 1) = q > r T«{-k, 1) > r T q (q, 1) = 
772, which implies Statement (i) of the lemma. Statement (ii) follows similarly. 

Returning to the proof of Theorem [3] we use mathematical induction on the value iteration 
algorithm (|27l) . Clearly Vo(ir) — — C(tt, 1) is linear. Assume now that Vk(n) is piecewise linear 
and concave on each of the four intervals Pi, . . .P 4 . That is, for two dimensional vectors r ) mi 
in the set Y\, 



VM = Y] min i m nl(ii G V x 



I 



Consider n E 7^2- From (1651) . since R* = B a , a = 1,2, Lemma [2] (i) together with the value 
iteration algorithm (l66l) yields 



V; +1 (vr) = min{C , (7r,2)+p 



m 3 6r 3 J mieri 



0}. 



Note the crucial point in the above equation: as a result of Lemma [2] (i) - the social learning 
filter maps P 2 to only P 1 (for a = 1) and P 3 (for a = 2). Since each of the terms in the 
above equation are piecewise linear and concave, it follows that Vfc + i(7r) is piecewise linear and 
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concave on V 2 - A similar proof holds for P 3 and this involves using Lemma Utri)- As a result 
the stopping set on each interval P t , I = 1, . . . , 4 is a convex region, i.e., an interval. This proves 
claim (ii). 

E. Proof of Lemma \3\ and Theorem |4] 

Proof of Lemma \3\ Let us introduce the following notation. Define 

S+ = {tt : C'-K > 0} and S= = {tx : C'tx = 0}. (67) 

The proof comprises of three parts. 

Statement (i): Under (PH), for every tx E S + , there exists a ft E S = such that ft > r tx. 

Proof. Consider any belief state tx E S + . Construct a line segment from e\ through the belief state 

tx and let this line segment intersect the hyperplane S = . Denote ft as this point of intersection. 

Clearly ft = ae% + (1 — q)tx where a = C'tx /{C'tx — C\). It is straightforwardly established that 

ft > r tx if a > 1 which is clearly true since C\ > and C'tx > for tx E S + . 

Statement (ii): Under (Al), (A2), (A3), if tx > r tx, then C'T*{ft , a) > =^ C'T v (tx,o) > 0. 

Proof. Under (Al), (A2), (A3), it follows from Theorem \W[2) in Appendix E that T n (7T,a) is 

MLR increasing, that is, ft > r tx implies T n (ft,a) > r T n (Tx,a). Under (A3), the elements of C 

are decreasing. So from Result [Q in Appendix \M it follows that C'T*(ft,a) < C'T n ('K,a). So 

C'T*(Tx,a) > implies C'T n (Tx,a) > 0. 

Statements (i) and (ii) imply that if the social learning filter (fTTI) maps belief states in S = 

to S, then all belief states in {tx : C'tx > 0} are also mapped to S. Since the hyperplane 

S = = {tx : C'tx = 0} has infinite points, how can we formulate a sufficient condition for belief 

states {tx : C'tx = 0} to be mapped to the polytope Vy+i7 (CI) serves as a sufficient condition 

as proved in Statement (iii) below. 

Statement (iii): A sufficient condition for C'T 7r (ft,a) < to hold for all ft E S = is that 

C'V^Vi, a) < for all X - 1 vertices v, of <|45>. 

Proof: Clearly every belief state tx E S = is a convex combination of the vertices, i.e., tx = 

Y^i otiVi, for some a, h > and J2i a i = 1- Now C'T Ui {v u a) < is equivalent to C'R Vi P'vi < 

since the normalization term in T 7r (-) is non-negative. This implies C'^ i o>iR Ui P'i>i < 0, and 

this is equivalent to C'FFP'tx < 0. 
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Proof of Theorem^ Define S = {it : Cir > 0}. 
Step 1: We first prove that V(ir) = for it E S. This is equivalent to saying that for {-k : Cir > 
0}, the optimal policy fi*(n) = 1. 

The proof of Step 1 is by induction on the value iteration algorithm ( |27T ). Suppose V (w) = 0. 
Then it trivially satisfies V (n) = for n G S. Next suppose Vk(n) = for n G S. Then for 
it e S, Assumption (CI) implies that T*(n,a) belongs to S implying that V (T 7 " (n , a)) = 0. 
So from (J27T) . it follows that V k+ i{n) = min{C"7r,0} = since Cix > for n G S. Since 
Vfc(-7r) converges pointwise to V(7r), Step 1 follows. For initial condition Vo(vr) = —C(tt, 1) (see 
dTTT)). V(7r) obtained as the limit of the value iteration algorithm is identical to that with initial 
condition Vo(7r) = 0. 

Step 2: From Bellman's equation it follows trivially that for {n : C'n < 0}, (jffa) = 2. 

From Steps 1 and 2, we have Cix > iff /U*(7r) = 1. 

F. Proof of Theorem \5\ 

This section is in two parts. We start with several preliminary results that are similar to the 
results in [|33l . Then the proof of Theorem \5\ is presented. 
1) Structural Properties of Social Learning Filter: 

Theorem 10. The following structural properties hold for the public belief update evaluated by 
the social learning Bayesian filter defined in ( [771) ; 

1) Under (S), M* is TP2 for n G II (X), see Definition® 

2) Under (Al), (A2), (S) ifir 1 ,7i 2 G V\, then 7T\ > r n 2 implies T 7Tl (n 1 ,a) > r T 7T2 (7i 2 ,a) 

3) Under (Al), (A2), (S), if ix\,Ti2 £ n(X), then m > r ^2 =^ ct(tti, •) >s o~(^2, ■)■ 

4) Under (Al), (A2), if n G IT(X), r/zen a > a implies T*(tt,o) > r T n (n,a). 

Proof: 1). We need to show that for fixed n G H(X), 

M ya M yla , < M yAy ,^ aAa , M yVy ,^ aVa , (68) 

Recall from (fT3l) that M n is a matrix with a single 1 in each row at a* (it, y) and all other 
elements zero. So the only non trivial case to prove is when both terms on the LHS are 1, i.e., 
M la*{*,y) = l and M y',a*(n,y') = h Assuming (A2), (S), Theorem Hi) says that a*(vr,y) | V- 
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This means that y < y' =>- a*(ir,y) < a*(n,y') and y > y' =>■ a*(ir,y) > a*(ir,y'). In 
either case (1681) holds with equality since the RHS is identical to the LHS. 

2). Since P is TP2 (A2), we have P'n > r P'tx for tx > r tx, see 11231 . So it suffices to show that 
ttr¥- >r vm- f° r ^ >r 7f. Moreover, since 7r, tx belong to the same polytope V\, R? a = R* = R l a 
(say), see (|3Q|) . From G3l . a sufficient condition for jt^ >r jtrt^ lS that i? z is TP2. Of course 
we need this to hold on each of the Y + 1 polytopes, i..e, for I — 1, . . . , Y + 1. 

So under what conditions is i? 71 " TP2 in each of the Y + 1 polytopes? Note (A2) says B is 
TP2. Since iT = BM n (see (TOT) ) and the product of TP2 matrices is TP2 OS PP-471], it only 
remains to prove that K'P is TP2. This follows from (S) as proved in (i) above. 

3). Since P is TP2 (A2), it suffices to prove that ir > r tt implies l'B^n > s 1'B^tx, i.e., 
Y,iHa>a B lcFi > HiHa>a B t^i- From Statement 1, M n and M* are TP2 and from (A2) 
B is TP2. So B„ = BM W and B* = BAP are TP2. Therefore, from Definition gjiii), the 
rows of R n and R n are MLR increasing. Since MLR dominance implies first order stochastic 
dominance, this means that both YL^ Rf n and V„. - Rf n are increasing with i. Since tx > r tx, 
Result (2i),(ii) and (iii), imply that a sufficient condition for 7r > r 7f ==>- l'R^ir > s V R^Tt is 
*at £ a>a ^a > Ea>. ^L °r equivalently, £ y ^ E a >. M?« > £„ ^ £ a> , A&- A sufficient 
condition for this is tx > r tx => V ^_ MZ„ > V . _ ML. But this condition holds from the 

— > i—ia>a ya — Z-^a>a ya 

structure of M in (1631) and the fact that a*(ir,y) is MLR increasing wrt tx (Statement (ii) of 
Theorem [2] in Appendix 0. 

4). Since P is TP2 (A2), it suffices to prove that tx >,. tx =>- -J*fifpZ w >r VB ^ p, n for a > a'. 

a a' 

Since K* is TP2 (Al), this result follows straightforwardly from ||5T1 Theorem 4]. ■ 

2) Proof of Theorem \5} Here we prove Theorem [5l The update of belief state in Vy+x is 
simple, since RJ a = 1/X (uniformly distributed) for each i, see (|34|) for example. In comparison, 
the sensor management case of Theorem [8] on V2 with update given by (|6Q|) requires an arbitrary 
TP2 matrix R n . To allow for this generality, in the proof below, we assume R n is an arbitrary 
TP2 matrix on Vy+x- 

Part 1: Under (Al), (A2), (A3), (S), (C3), (C2), (PH), V(tx) is MLR decreasing on polytope 

Vy+i- 

The proof of Part 1 is by mathematical induction on the value iteration algorithm (1271) . Start with 
V (ir) = —C(tx, 1) in (1271) . Clearly this is MLR decreasing on Ii(X) and therefore on polytope 
Vy+i since f is chosen with increasing elements, see (flTl) . Now for the inductive step: Assume at 
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iteration k, V k (n) is MLR decreasing on polytope Vy+i- Then since T 7r (7r, a) is MLR increasing 
in a (Theorem [134)) and T(tv, a) € V Y +i by (C2), it follows that V k (T n (ir, 1)) > V k (T n (ir, 2)). 
Consider any n > r n G 'Py+i. Since cr(7r, .) > s a (ft, .) (see Theorem [101 3)). 

^ ^(T^tt, o))a(7r, a) < J2 Vk(T*(ir, a))cr(ft, a) (69) 

a a 

Next since tt > r vr => T*(-K,a) > r ^(#,0) (Theorem [132)), so V^tt) MLR decreasing in 
7r implies V k (T n (n,a)) < V k (T*(ft,a)). So from ([69]), vr > r tt implies 

J2 V k (T*(Tr, a))a(n, a) < ^ ^(T^tt, a))cr(7f, a) < ^ \4(T^vr, a))a(vr, a) (70) 

a a a 

From (A3), C(7r, 2) is MLR decreasing. So n > r ft implies C(rr, 2) < C(ft, 2). Therefore n > r 
n implies Q k+1 (n,2) < Q k+1 (ft,2). Thus min u Q k+1 (ir,u) < mm u Q k+ i(ft,u), i.e., V k+1 (n) < 
Vfc + i(7f). This completes the induction step. Finally, since V k — ¥ V as k — > 00 pointwise (see 
discussion below (1271)). V is MLR decreasing on polytope Vy+i- 

Part 2: Under the above conditions, fi*(7r) is MLR increasing on polytope Vy+i- It suffices 
to show that Q(ir, u) is submodular (see Definition [J) on Vy+i wrt the MLR ordering since then 
Theorem [9] applies implying that ij*(ti) is MLR decreasing in n G Vy+i- To show that Q(n,u) 
in ((251) is submodular, we need to show that <3(tt, 2) is MLR decreasing in n. But this follows 
from (A3) and Part 1. Thus from Theorem H (EU) holds. 



G. Proof of Theorem [7| 

Given any 7Ti,7T2 G /(ex,7f) with 7r 2 >l x 1*1, we need to prove: fie(i*i) < 1^9(1*2) iff 
0(X - 2) > 1, 0(i) < 0(X - 2) for i X -2. _But from the structure of ([53]), obvi- 

1*1 



ously /ie(7Ti) < 110(1*2) is equivalent to 
1 0(1) ••■ 6(X-2 



1 0' 



< 



1 6' 



71 2 



, or equivalently, 



On - tt 2 ) < 0. 

Now from Lemma Siii), 7r 2 >l x 1*1 implies that 7n = eie x + (1 — £\)ft, vr 2 = e 2 e x + (1 — £2)1* 
and €i < e 2 . Substituting these into the above expression, we need to prove 



(ei-e 2 )(0(X-2)- 



1 



1) • ■ ■ 6(X- 2) ft) < 0, Vtt g H x 

iff 9(X - 2) > 1, 0(z) < 0(X - 2), i < X - 2. This is obviously true. 

A similar proof shows that on lines l(e\, ft) the linear threshold policy satisfies ^0(711) < 119(1*2) 
iff 6(i) > for i < X - 2. 
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